Speech Processing

Processing and analysis of speech and audio signals, including accent identification, voice recognition, and audio-based machine learning.

Reading List

Under the Hood

Apple SpeechAnalyzer: The New King of On-Device Transcription

Jul 13, 2026529

Apple's new SpeechAnalyzer is now the fastest and most accurate on-device English speech engine for Mac and iPhone, surpassing Whisper Small.

AI Benchmarks On-Device AI Speech Processing Apple

Products & Announcements

VibeVoice: Microsoft's Open-Source Long-Form Voice AI

Apr 28, 2026386

VibeVoice is an open-source Microsoft framework designed for high-efficiency, long-form speech recognition and multi-speaker text-to-speech synthesis.

Voice AI Text-to-Speech Speech Processing Open Source Microsoft

Products & Announcements

Ghost Pepper: Private Local AI Dictation for Mac

Apr 6, 2026464

A 100% local, privacy-focused macOS app for hold-to-talk speech-to-text and AI-powered transcription cleanup.

On-Device AI Speech Processing macOS Data Privacy Open Source

Products & Announcements

Cohere Transcribe: The New Open-Source Leader in Speech Recognition

Mar 31, 2026218

Cohere Transcribe is a new open-source ASR model that delivers industry-leading accuracy and efficiency for enterprise speech-to-text applications.

Speech Processing Open Source AI Benchmarks Multilingual AI Enterprise AI Adoption

Under the Hood

Accents in 3D: How a HuBERT Model Maps English Accent Clusters

Oct 15, 2025260

A HuBERT model’s 3D latent map of English accents clusters by geography and social history more than by language-family taxonomy, offering an exploratory—but not definitive—view of accent relationships.

Data Visualization Model Fine-Tuning Speech Processing Computational Linguistics

Products & Announcements

Qwen3‑Omni: Real-Time Multimodal LLM with Speech I/O and SOTA Audio‑Video Performance

Sep 22, 2025571

A unified, real-time multimodal LLM with speech I/O that achieves SOTA across audio/video while remaining practical to deploy.

Multimodal AI Open Source Speech Processing Foundation Models