Software Engineering Radio - the podcast for professional software developers

SE Radio 703: Sahaj Garg on Low Latency AI

Jan 14, 2026
In this engaging discussion, Sahaj Garg, CTO and co-founder of Whispr.ai, shares his expertise on low-latency AI applications. He explains how latency affects user experience and offers insights into measuring and diagnosing latency issues. The conversation covers critical trade-offs between speed, accuracy, and cost in AI models. Sahaj also introduces optimization techniques like quantization and distillation, stressing the importance of low latency for user engagement in interactive apps. Tune in for invaluable tips on navigating the latency landscape!
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

User-Perceived Latency Is What Matters

  • Latency equals user-perceived elapsed time from action to response.
  • Focus on what the user experiences, not internal processing metrics.
INSIGHT

Latency Tolerance Varies By Interaction

  • Tolerance for latency depends heavily on the interaction type.
  • Keystrokes need ~50–100ms, voice tolerates ~500–600ms, and users perceive sub-10ms delays.
ADVICE

Measure Per-Stage Latency To Find P99 Issues

  • Break down latency into observable stages and measure each hop.
  • Use per-step telemetry to find and fix P99 sources like network hops or region mismatches.
Get the Snipd Podcast app to discover more snips from this episode
Get the app