TD Stuff

Gemini 3.1 Pro: Advancing Multimodal Reasoning and Safety

Feb 19, 2026612

Gemini 3.1 Pro is a high-performance multimodal AI that advances reasoning and coding capabilities while remaining below critical safety risk thresholds.

AI Safety AI Agents Multimodal AI AI Benchmarks

Products & Announcements

Lyria — Gemini AI music & song generator

Feb 19, 2026

Lyria 3 is a high-fidelity AI tool within Gemini that turns prompts and images into shareable, 30-second custom music tracks.

AI-Generated Content AI Music Generation Multimodal AI

Products & Announcements

Waymo World Model: Controllable, Multimodal Simulation for Rare-Event-Ready AVs

Feb 6, 20261160

A controllable, Genie 3–powered simulator generates realistic camera and lidar worlds to train and test Waymo’s driver on everyday and rare events at scale.

Autonomous Vehicles AI Safety Multimodal AI Synthetic Data & Simulation

Products & Announcements

DeepMind’s Gemini AI to Power Boston Dynamics’ New Atlas Humanoids

Jan 6, 2026

DeepMind’s Gemini Robotics AI is coming to Boston Dynamics’ Atlas humanoids to fast-track safe, scalable industrial use—starting in automotive manufacturing.

Robotics Corporate AI Strategy Multimodal AI AI Agents

Products & Announcements

Gemini 3 Flash Launches: Frontier Reasoning, Flash Speed, Lower Cost

Dec 17, 20251102

Gemini 3 Flash brings frontier‑grade reasoning to everyone at Flash speed and lower cost, and it’s rolling out across Google’s ecosystem.

AI Benchmarks LLM Reasoning Technology Economics Multimodal AI Corporate AI Strategy

Products & Announcements

ChatGPT Images gets GPT‑Image‑1.5: faster, more precise, and easier to create

Dec 17, 2025522

OpenAI’s GPT‑Image‑1.5 makes ChatGPT image generation faster, more precise, and easier to use—now with a dedicated creation space and cheaper, higher-fidelity API workflows.

AI Image Generation Multimodal AI OpenAI AI Ethics

Products & Announcements

FLUX.2: Production-Ready Visual Intelligence, Open Core and State of the Art

Nov 25, 2025372

FLUX.2 is BFL’s production-ready, open-core visual model family that unifies powerful image generation and editing—with multi-reference fidelity and robust typography—on a modern VLM+flow architecture.

AI Image Generation Open Source Multimodal AI AI Architecture

Under the Hood

Apple: LLMs Accurately Recognize Activities from Captioned Audio and Motion Data

Nov 22, 2025

LLMs can accurately recognize daily activities by fusing captioned audio and motion data—boosting performance without raw audio or specialized multimodal training.

Multimodal AI Data Privacy Sensor Technology Activity Recognition

Products & Announcements

Google unveils Nano Banana Pro: accurate text, pro controls, broad rollout

Nov 20, 20251275

A next-gen, Gemini 3 Pro–powered image model that combines accurate multilingual text, consistent multi-asset blending, and studio-grade controls—rolling out widely with SynthID transparency.

AI Image Generation Multimodal AI AI-Generated Content Corporate AI Strategy

Products & Announcements

Gemini 3: Google’s most intelligent, widely deployed AI arrives

Nov 18, 20251735

Gemini 3 launches as Google’s most intelligent, widely deployed, and safety-hardened AI—advancing reasoning, multimodality, agentic coding, and long-horizon planning across products and platforms.

AI Benchmarks AI Coding Agents Multimodal AI AI Safety Corporate AI Strategy

Products & Announcements

Gemini 3 Pro Comes to Gemini CLI: 5 Ways to Supercharge Your Terminal

Nov 18, 2025104

Gemini 3 Pro now powers the Gemini CLI, turning natural-language ideas into end-to-end terminal workflows—from coding to cloud ops.

AI Coding Agents Developer Tooling Multimodal AI Human-AI Collaboration

Products & Announcements

Gemini 3 Pro launches: agentic coding meets multimodal app building

Nov 18, 20251735

Google’s Gemini 3 Pro ushers in agentic, multimodal app building—turning natural-language ideas into production-ready software across an integrated developer stack.

AI Coding Agents Multimodal AI Vibe Coding Developer Tooling

Under the Hood

Three meanings of world model: assets, simulators, and brains

Nov 14, 2025141

World models now mean assets, simulators, or brains—three different layers of the same aim to give machines structured understanding beyond next-token prediction.

World Models AI Architecture Multimodal AI AI Hype

Under the Hood

Nano Banana: Google’s AR Image Model That Actually Follows Your Prompts

Nov 13, 2025887

Nano Banana nails prompt fidelity and structured control—far better than most rivals—while faltering at style transfer and raising moderation/IP concerns.

AI Image Generation Prompt Engineering Multimodal AI Content Moderation

Creative Code

WIP AI Music-Video Generator: Image + Audio In, Video Clip Out

Nov 12, 2025

Preview of an AI tool that turns an artist image and audio into a short music video, with a near-term release and a call for user feedback.

AI Video Generation Multimodal AI AI Image Generation GPU Computing AI Music Generation

Products & Announcements

Ovi: Open-Source Text-to-Audio-Video Generation with Efficient Inference

Oct 22, 2025314

An open-source, configurable system for synchronized text-conditioned video and audio generation that runs on modest GPUs via quantization and parallelism.

AI Video Generation Multimodal AI Open Source Diffusion Models

Products & Announcements

DeepSeek-OCR: LLM-Centric Visual-Text Compression for Fast, Flexible OCR

Oct 20, 20251003

An LLM-focused, high-throughput OCR system that compresses visual context for efficient document and image understanding.

Computer Vision Multimodal AI Open Source AI Training Data

Products & Announcements

Gemini 2.5 Flash and Flash-Lite Previews: Faster, Smarter, Cheaper, plus -latest Aliases

Sep 25, 2025540

Gemini 2.5 Flash and Flash-Lite previews are faster, smarter, and cheaper, with new -latest aliases for easy access and stable models recommended for production.

Google Technology Economics Multimodal AI AI Benchmarks AI Agents

Products & Announcements

Qwen3‑Omni: Real-Time Multimodal LLM with Speech I/O and SOTA Audio‑Video Performance

Sep 22, 2025571

A unified, real-time multimodal LLM with speech I/O that achieves SOTA across audio/video while remaining practical to deploy.

Multimodal AI Open Source Speech Processing Foundation Models

Products & Announcements

Personalized AI Textbooks Improve Learning and Retention

Sep 18, 2025359

Generative AI turns static textbooks into personalized, multimodal lessons that measurably boost learning and engagement.

AI in Education AI Personalization Multimodal AI AI-Generated Content

Damage Control

AI Hype, Accessibility, and a Blind Skeptic’s Warning

Sep 3, 2025

AI gives blind users access but at the cost of accuracy and new dependencies, and the author rejects the hype while bracing for future accessibility battles.

AI Hype Disability & Accessibility AI Ethics Multimodal AI