
Software Engineering Radio - the podcast for professional software developers SE Radio 703: Sahaj Garg on Low Latency AI
Jan 14, 2026
In this engaging discussion, Sahaj Garg, CTO and co-founder of Whispr.ai, shares his expertise on low-latency AI applications. He explains how latency affects user experience and offers insights into measuring and diagnosing latency issues. The conversation covers critical trade-offs between speed, accuracy, and cost in AI models. Sahaj also introduces optimization techniques like quantization and distillation, stressing the importance of low latency for user engagement in interactive apps. Tune in for invaluable tips on navigating the latency landscape!
AI Snips
Chapters
Transcript
Episode notes
User-Perceived Latency Is What Matters
- Latency equals user-perceived elapsed time from action to response.
- Focus on what the user experiences, not internal processing metrics.
Latency Tolerance Varies By Interaction
- Tolerance for latency depends heavily on the interaction type.
- Keystrokes need ~50–100ms, voice tolerates ~500–600ms, and users perceive sub-10ms delays.
Measure Per-Stage Latency To Find P99 Issues
- Break down latency into observable stages and measure each hop.
- Use per-step telemetry to find and fix P99 sources like network hops or region mismatches.
