I-DLM: Matching Autoregressive Quality with Parallel Diffusion Speed
I-DLM achieves autoregressive-level quality and significantly higher throughput by incorporating a self-verification mechanism into parallel diffusion decoding.
Generative models that learn to denoise data through iterative refinement steps, widely used for image generation, video synthesis, and other creative AI applications.
I-DLM achieves autoregressive-level quality and significantly higher throughput by incorporating a self-verification mechanism into parallel diffusion decoding.
Skyfall-GS fuses satellite imagery with diffusion-driven iterative refinement to produce real-time, city-scale 3D scenes with superior geometry and textures—without 3D annotations.

Image editors are improving, but precise, localized, constraint-respecting edits remain the Achilles’ heel—even the best models stumble on spatial swaps and selective removals.

An open-source, configurable system for synchronized text-conditioned video and audio generation that runs on modest GPUs via quantization and parallelism.

BERT-style MLM is a single-step text diffusion process, and extending it to multiple masking steps turns RoBERTa into a workable text generator.

An open-source, world-consistent RGB-D video generator that turns a single image into controllable, long-range 3D scene explorations with state-of-the-art performance.
Share early diffusion steps across similar prompts to generate image sets faster and better, without retraining.