How modern LLM inference works and optimizations

Tuhin breaks inference into infrastructure and runtime problems and key metrics like time-to-first-token and throughput.

Play episode from 29:50

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!