
Unsupervised Learning with Jacob Effron Ep 74: Chief Scientist of Together.AI Tri Dao On The End of Nvidia's Dominance, Why Inference Costs Fell & The Next 10X in Speed
205 snips
Sep 10, 2025 Tri Dao, Chief Scientist at Together AI and a professor at Princeton, is a pioneer behind Flash Attention and Mamba. He discusses the dramatic 100x drop in inference costs since ChatGPT, driven by hardware-software co-design and memory optimization. Dao predicts Nvidia's dominance will wane in 2-3 years as specialized chips emerge. He also shares insights on AI models improving expert-level productivity and the challenges of generating quality training data for various domains, while envisioning another 10x cost reduction ahead.
AI Snips
Chapters
Transcript
Episode notes
Inference Will Become Multi-Silicon
- The inference market will diversify as workloads split into low-latency agents, high-throughput batch, and interactive chatbots.
- Specialized chips and stacks will emerge to serve these distinct performance profiles.
Place Focused Hardware Bets
- Startups must place focused bets on particular workloads (e.g., video, agents, batch) rather than general-purpose chips.
- If you don't specialize, incumbents will out-execute you on general workloads.
Build Tooling To Hide GPU Churn
- Invest in tooling and domain-specific languages to shield engineers from low-level GPU changes across generations.
- Balance exposing hardware features with developer productivity to avoid constant rewrites.
