Unsupervised Learning with Jacob Effron

Ep 74: Chief Scientist of Together.AI Tri Dao On The End of Nvidia's Dominance, Why Inference Costs Fell & The Next 10X in Speed

205 snips
Sep 10, 2025
Tri Dao, Chief Scientist at Together AI and a professor at Princeton, is a pioneer behind Flash Attention and Mamba. He discusses the dramatic 100x drop in inference costs since ChatGPT, driven by hardware-software co-design and memory optimization. Dao predicts Nvidia's dominance will wane in 2-3 years as specialized chips emerge. He also shares insights on AI models improving expert-level productivity and the challenges of generating quality training data for various domains, while envisioning another 10x cost reduction ahead.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Inference Will Become Multi-Silicon

  • The inference market will diversify as workloads split into low-latency agents, high-throughput batch, and interactive chatbots.
  • Specialized chips and stacks will emerge to serve these distinct performance profiles.
ADVICE

Place Focused Hardware Bets

  • Startups must place focused bets on particular workloads (e.g., video, agents, batch) rather than general-purpose chips.
  • If you don't specialize, incumbents will out-execute you on general workloads.
ADVICE

Build Tooling To Hide GPU Churn

  • Invest in tooling and domain-specific languages to shield engineers from low-level GPU changes across generations.
  • Balance exposing hardware features with developer productivity to avoid constant rewrites.
Get the Snipd Podcast app to discover more snips from this episode
Get the app