AI Engineer YouTube · June 5, 2026

Beyond Transcription: Building Voice AI That Understands Conversations — Hervé Bredin, pyannoteAI

Beyond Transcription: Building Voice AI That Understands Conversations — Hervé Bredin, pyannoteAI video thumbnail
Why it matters

The open ASR leaderboard reports Nvidia Parakeet at 11.4% word error rate on AMI meeting data. Hervé Bredin runs the same model on the same dataset and gets 26%. Same model, same recordings, different microphone: the leaderboard uses headset audio, he uses the table mic. Most voice AI benchmarks are measuring single sp

My takeaway: Beyond Transcription: Building Voice AI That Understands Conversations — Hervé Bredin, pyannoteAI is a model-evaluation signal. The practical read is to tie capability claims to evidence, launch criteria, and regression tests rather than relying on demos or benchmark headlines.