blog
notes from the router.
Field notes on the parts of voice AI that usually fail in production: latency budgets, endpointing, provider selection, benchmark methodology, and the engineering decisions behind reliable spoken agents.
latest note
Cutting voice agent latency to sub-500ms — a practical playbook
A latency budget for cascaded voice pipelines, why endpointing is the silent killer, and where the architecture itself has to change when you cannot push lower.
Cutting voice agent latency to sub-500ms — a practical playbook
A latency budget for cascaded voice pipelines, why endpointing is the silent killer, and where the architecture itself has to change when you cannot push lower.
Designing barge-in that actually works
VAD is the load-bearing component. Most VADs are wrong for the job. A field guide to interruption that doesn't apologize, doesn't cough-trigger, and doesn't fall apart on a real phone call.
Evaluating voice ai quality in production — beyond WER
Your benchmark says 5% WER. Your users say the agent can't understand them. Both are correct. The metrics that actually predict production failures.
A developer's framework for picking an stt provider
Six axes that decide whether your product ships — accuracy, latency, language coverage, cost, API ergonomics, vocabulary tolerance — with the tolerance thresholds we use to route traffic.
Streaming vs batch stt — when each one wins
Most of you shouldn't be using streaming STT. Four questions to answer honestly before you open another WebSocket.
Building voice ai for noisy real-world audio
Noise is not one thing — it's four. A field guide to suppression, SNR thresholds, and the model choice that survives where your users actually live.