Software Engineering Radio - the podcast for professional software developers

SE Radio 703: Sahaj Garg on Low Latency AI

Jan 14, 2026

In this engaging discussion, Sahaj Garg, CTO and co-founder of Whispr.ai, shares his expertise on low-latency AI applications. He explains how latency affects user experience and offers insights into measuring and diagnosing latency issues. The conversation covers critical trade-offs between speed, accuracy, and cost in AI models. Sahaj also introduces optimization techniques like quantization and distillation, stressing the importance of low latency for user engagement in interactive apps. Tune in for invaluable tips on navigating the latency landscape!

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

User-Perceived Latency Is What Matters

Latency equals user-perceived elapsed time from action to response.
Focus on what the user experiences, not internal processing metrics.

INSIGHT

Latency Tolerance Varies By Interaction

Tolerance for latency depends heavily on the interaction type.
Keystrokes need ~50–100ms, voice tolerates ~500–600ms, and users perceive sub-10ms delays.

ADVICE

Measure Per-Stage Latency To Find P99 Issues

Break down latency into observable stages and measure each hop.
Use per-step telemetry to find and fix P99 sources like network hops or region mismatches.

Get the Snipd Podcast app to discover more snips from this episode

Get the app