blog

notes from the router.

Field notes on the parts of voice AI that usually fail in production: latency budgets, endpointing, provider selection, benchmark methodology, and the engineering decisions behind reliable spoken agents.

engineering methodology 6 notes

latest note

engineering · 2026-04-23 · 6 min

Cutting voice agent latency to sub-500ms — a practical playbook

A latency budget for cascaded voice pipelines, why endpointing is the silent killer, and where the architecture itself has to change when you cannot push lower.

read dispatch

engineering · 2026-04-23 · 6 min

Cutting voice agent latency to sub-500ms — a practical playbook

A latency budget for cascaded voice pipelines, why endpointing is the silent killer, and where the architecture itself has to change when you cannot push lower.

engineering · 2026-04-23 · 5 min

Designing barge-in that actually works

VAD is the load-bearing component. Most VADs are wrong for the job. A field guide to interruption that doesn't apologize, doesn't cough-trigger, and doesn't fall apart on a real phone call.

methodology · 2026-04-23 · 6 min

Evaluating voice ai quality in production — beyond WER

Your benchmark says 5% WER. Your users say the agent can't understand them. Both are correct. The metrics that actually predict production failures.

methodology · 2026-04-23 · 5 min

A developer's framework for picking an stt provider

Six axes that decide whether your product ships — accuracy, latency, language coverage, cost, API ergonomics, vocabulary tolerance — with the tolerance thresholds we use to route traffic.

engineering · 2026-04-23 · 5 min

Streaming vs batch stt — when each one wins

Most of you shouldn't be using streaming STT. Four questions to answer honestly before you open another WebSocket.

engineering · 2026-04-23 · 6 min

Building voice ai for noisy real-world audio

Noise is not one thing — it's four. A field guide to suppression, SNR thresholds, and the model choice that survives where your users actually live.