
VibeVoice: Microsoft's Open-Source Long-Form Voice AI
VibeVoice is an open-source Microsoft framework designed for high-efficiency, long-form speech recognition and multi-speaker text-to-speech synthesis.
Processing and analysis of speech and audio signals, including accent identification, voice recognition, and audio-based machine learning.

VibeVoice is an open-source Microsoft framework designed for high-efficiency, long-form speech recognition and multi-speaker text-to-speech synthesis.

A 100% local, privacy-focused macOS app for hold-to-talk speech-to-text and AI-powered transcription cleanup.

Cohere Transcribe is a new open-source ASR model that delivers industry-leading accuracy and efficiency for enterprise speech-to-text applications.

A HuBERT model’s 3D latent map of English accents clusters by geography and social history more than by language-family taxonomy, offering an exploratory—but not definitive—view of accent relationships.

A unified, real-time multimodal LLM with speech I/O that achieves SOTA across audio/video while remaining practical to deploy.