In 2022, Lin Qiao decided to leave Meta, where she was managing several hundred engineers, to start Fireworks AI. In this episode, we sit down with Lin for a deep dive on her work, starting with her leadership on PyTorch, now one of the most influential machine learning frameworks in the industry, powering research and production at scale across the AI industry.
Now at the helm of Fireworks AI, Lin is leading a new wave in generative AI infrastructure, simplifying model deployment and optimizing performance to empower all developers building with Gen AI technologies.
We dive into the technical core of Fireworks AI, uncovering their innovative strategies for model optimization, Function Calling in agentic development, and low-level breakthroughs at the GPU and CUDA layers.
Fireworks AI
Website - https://fireworks.ai
X/Twitter - https://twitter.com/FireworksAI_HQ
Lin Qiao
LinkedIn - https://www.linkedin.com/in/lin-qiao-22248b4
X/Twitter - https://twitter.com/lqiao
FIRSTMARK
Website - https://firstmark.com
X/Twitter - https://twitter.com/FirstMarkCap
Matt Turck (Managing Director)
LinkedIn - https://www.linkedin.com/in/turck/
X/Twitter - https://twitter.com/mattturck
(00:00) Intro
(01:20) What is Fireworks AI?
(02:47) What is PyTorch?
(12:50) Traditional ML vs GenAI
(14:54) AI’s enterprise transformation
(16:16) From Meta to Fireworks
(19:39) Simplifying AI infrastructure
(20:41) How Fireworks clients use GenAI
(22:02) How many models are powered by Fireworks
(30:09) LLM partitioning
(34:43) Real-time vs pre-set search
(36:56) Reinforcement learning
(38:56) Function calling
(44:23) Low-level architecture overview
(45:47) Cloud GPUs & hardware support
(47:16) VPC vs on-prem vs local deployment
(49:50) Decreasing inference costs and its business implications
(52:46) Fireworks roadmap
(55:03) AI future predictions