Computer Vision

Technologies and applications that enable machines to interpret and understand visual information from the world, including image recognition, object detection, and real-time video analysis.

Reading List

Products & Announcements

Interfaze: A Hybrid Architecture for High-Accuracy Deterministic AI

May 11, 2026164

Interfaze is a hybrid AI model that merges DNN precision with transformer flexibility to outperform generalist LLMs in high-accuracy, deterministic tasks.

AI Architecture AI Benchmarks LLM Inference Structured Output Computer Vision

Products & Announcements

Gemini Robotics-ER 1.6: Advancing Embodied AI Reasoning

Apr 15, 2026216

Gemini Robotics-ER 1.6 provides robots with enhanced spatial reasoning and instrument-reading capabilities to bridge the gap between AI and physical action.

Robotics Multimodal AI Computer Vision AI Agents Embodied AI

Products & Announcements

DaVinci Resolve Brings Hollywood Color Tools to Still Photography

Apr 14, 20261135

DaVinci Resolve now offers its advanced Hollywood color grading and AI toolset to photographers through a dedicated, high-performance Photo page.

Creative Software Color Management Image Processing GPU Computing Computer Vision

Damage Control

Unveiling the Hidden Data in Your Photos

Apr 13, 2026132

Your photos reveal far more private data to automated systems than you might expect.

Data Privacy Computer Vision Surveillance Capitalism Interactive Web Tools Information Literacy

Under the Hood

VOID: Interaction-Aware Video Object Removal and Physics-Based Inpainting

Apr 7, 2026182

VOID is a video editing framework that removes objects and realistically simulates the resulting physical interactions and scene changes.

AI Video Generation Computer Vision VFX & Post-Production Synthetic Data & Simulation Video Inpainting

Programming

SentrySearch: Semantic Video Search for Dashcams

Mar 24, 2026428

SentrySearch enables semantic natural language search and automatic clipping of dashcam footage using Gemini's multimodal video embeddings.

Computer Vision Vector Embeddings Multimodal AI Vector Databases Media Processing

Products & Announcements

From Pikachus to Pizza: How Pokémon Go Data Navigates Delivery Robots

Mar 16, 2026223

Pokémon Go's massive database of crowdsourced AR images is now being used to provide centimeter-level navigation for autonomous delivery robots.

Autonomous Vehicles Crowdsourced Mapping Robotics Computer Vision Geolocation

Creative Code

GitHub - nikopueringer/CorridorKey: Perfect Green Screen Keys

Mar 9, 2026

CorridorKey is an AI-driven green screen keyer that uses neural networks to reconstruct true foreground colors and delicate transparency for professional VFX compositing.

VFX & Post-Production Computer Vision AI-Generated Content Image Processing

Creative Code

Posturr: Blur Your Mac Screen When You Slouch

Jan 25, 2026692

An open-source macOS app that uses your camera to detect slouching and gently enforce better posture by blurring the screen.

Open Source Data Privacy Workplace Wellbeing Computer Vision macOS

Products & Announcements

From Miles to Meters: GeoSpy’s SuperBolt Pinpoints Vehicle Photos Fast

Jan 6, 2026153

GeoSpy’s SuperBolt upgrades photo geolocation from miles to meters, enabling rapid, precise, and scalable vehicle recovery.

Computer Vision Data Privacy Geolocation Surveillance Technology

Creative Code

Pose Animator: Real-time SVG Puppeteering with TensorFlow.js

Nov 10, 2025164

An open-source tool that turns SVGs into real-time, browser-based puppets using PoseNet/FaceMesh and smart vector deformation.

Computer Vision Creative Coding Web Animation Open Source

Products & Announcements

Samsung 2025 Family Hub Update: Unified UI, Smarter Food Tracking, Stronger Security

Nov 9, 2025302

Samsung’s 2025 Family Hub update brings a unified interface, smarter food tracking, personalized Bixby, and expanded Knox security to its smart home ecosystem.

Smart Home Privacy IoT Security Corporate Accountability Computer Vision

Products & Announcements

Skyfall-GS: Real-Time City-Scale 3D from Satellite Images via Diffusion-Guided Refinement

Nov 3, 2025147

Skyfall-GS fuses satellite imagery with diffusion-driven iterative refinement to produce real-time, city-scale 3D scenes with superior geometry and textures—without 3D annotations.

Gaussian Splatting Computer Vision 3D Modeling Diffusion Models Satellite Imagery

Damage Control

AI Misidentifies Doritos Bag as Gun, Police Detain Teen at Baltimore School

Oct 23, 2025693

An AI gun detector misread a Doritos bag as a weapon, triggering an armed police response and renewing concerns about AI surveillance in schools.

AI Safety Surveillance Technology Civil Liberties Corporate Accountability Computer Vision

Damage Control

AI Checkouts Made BMO Stadium Worse: Slower Lines, Fewer Choices

Oct 20, 2025167

AI checkouts at BMO Stadium made everything slower, simpler, and worse for fans—especially in the heat—despite claims they’re faster.

AI Hype Computer Vision Corporate Accountability Consumer Economics Labor Economics

Products & Announcements

DeepSeek-OCR: LLM-Centric Visual-Text Compression for Fast, Flexible OCR

Oct 20, 20251003

An LLM-focused, high-throughput OCR system that compresses visual context for efficient document and image understanding.

Computer Vision Multimodal AI Open Source AI Training Data

Creative Code

Macro Insects as 3D Gaussian Splats via Focus Stacking

Oct 12, 2025425

Focus-stacked macro photography plus COLMAP and Postshot yields sharp, photoreal 3D Gaussian splats of insects, with a free CC BY model shared.

Gaussian Splatting Computer Vision 3D Modeling Photogrammetry Computational Photography

Products & Announcements

Gemini 2.5 Computer Use: High‑performance, safe UI control via API

Oct 7, 2025636

Google’s Gemini 2.5 Computer Use brings high-accuracy, low-latency, safety-aware UI control to developers via the Gemini API.

AI Agents Computer Vision Browser Automation AI Safety AI Benchmarks

Under the Hood

Veo 3: Emergent Zero‑Shot Video Intelligence Toward Vision Foundation Models

Sep 25, 2025105

Veo 3’s emergent zero-shot skills across perception, physics, manipulation, and reasoning point to video models becoming generalist vision foundation models.

Computer Vision AI Video Generation Foundation Models Zero-Shot Learning

Under the Hood

HunyuanWorld-Voyager: World-Consistent RGB-D Video and 3D from a Single Image

Sep 3, 2025322

An open-source, world-consistent RGB-D video generator that turns a single image into controllable, long-range 3D scene explorations with state-of-the-art performance.

Diffusion Models Computer Vision 3D Modeling World Models AI Video Generation

Creative Code

AR Fluid Sim That Collides with Real Objects

Sep 2, 2025147

An AR-style setup lets a fluid simulation collide with real objects by aligning a webcam feed—filtered to avoid feedback—with the digital solver.

Fluid Simulation Augmented Reality Computer Vision Creative Coding