Unsupervised Learning

Ep 74: Chief Scientist of Together.AI Tri Dao On The End of Nvidia's Dominance, Why Inference Costs Fell & The Next 10X in Speed

102 snips
Sep 10, 2025
Tri Dao, Chief Scientist at Together AI and a professor at Princeton, is a pioneer behind Flash Attention and Mamba. He discusses the dramatic 100x drop in inference costs since ChatGPT, driven by hardware-software co-design and memory optimization. Dao predicts Nvidia's dominance will wane in 2-3 years as specialized chips emerge. He also shares insights on AI models improving expert-level productivity and the challenges of generating quality training data for various domains, while envisioning another 10x cost reduction ahead.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Architecture Appears Stable But Changes Matter

  • Transformer architectures have broadly stabilized at a high level, but many important internal variations continue to change workload characteristics.
  • These micro-changes make chip design and optimization harder because performance depends on fine-grained model details.
INSIGHT

Inference Will Become Multi-Silicon

  • The inference market will diversify as workloads split into low-latency agents, high-throughput batch, and interactive chatbots.
  • Specialized chips and stacks will emerge to serve these distinct performance profiles.
ADVICE

Place Focused Hardware Bets

  • Startups must place focused bets on particular workloads (e.g., video, agents, batch) rather than general-purpose chips.
  • If you don't specialize, incumbents will out-execute you on general workloads.
Get the Snipd Podcast app to discover more snips from this episode
Get the app