Fireworks Founder Lin Qiao on How Fast Inference and Small Models Will Benefit Businesses
Aug 13, 2024
auto_awesome
Lin Qiao, founder and CEO of Fireworks and former head of the PyTorch team at Meta, shares insights on the evolving landscape of generative AI. She discusses how her platform aims to democratize access to AI with fast, cost-effective inference using smaller models. Lin explains the challenges B2C companies face with latency and operational costs. She also predicts the convergence of open and closed-source models and highlights the importance of simple API access for diverse AI applications. Her vision could transform how businesses utilize AI technology.
Fireworks emphasizes low latency and cost-efficient AI solutions, significantly reducing deployment time from years to weeks for enterprises.
The focus on PyTorch as a foundational tool facilitates a smoother transition from research to industry applications, ensuring high-quality, user-friendly AI services.
Deep dives
Overview of Fireworks and Its Mission
Fireworks is a SaaS platform designed for general AI inference and high-quality tuning, established in 2022. It focuses on creating a small model stack that enables low latency and cost-efficient solutions for enterprises. The platform also emphasizes automated customization, allowing businesses to tailor AI services for specific needs. This mission aims to significantly accelerate time-to-market, reducing the typical deployment timeframe from years to mere weeks.
The Impact of PyTorch in AI Development
PyTorch serves as a foundational tool for creating digital models, enabling researchers to build and experiment easily with neural networks. The complexity lies in ensuring these models can process data quickly enough for production, which was a challenge addressed during the development of PyTorch. The decision to focus on PyTorch, over other frameworks, stems from its strong adoption among researchers, which creates a natural progression to industry applications. This consistent flow from research to production underscores why Fireworks dedicates its resources solely to optimizing PyTorch, avoiding distractions from supporting multiple frameworks.
Simplifying Complexity for Enhanced Usability
The development journey of PyTorch underlined the importance of simplicity in user experience, leading to continuous iterations aimed at reducing complexity. The transition from multiple frameworks to a unified PyTorch model involved overcoming significant technical challenges, which ultimately revealed the strategic goal of maintaining a user-friendly interface while enhancing back-end performance. This approach not only streamlines the user experience but also addresses the critical need for high-quality service with minimal latency. By automating many complexities in the framework, Fireworks enables developers to focus on application innovation without getting mired in technical details.
Customer Engagement and Market Trends
Customers increasingly seek to transition from basic exploration of AI technologies, often beginning with OpenAI’s models, to more robust, enterprise-level solutions capable of handling high responsiveness and low latency. As businesses gain confidence in their AI applications, they seek to scale their operations sustainably, leading them to Fireworks due to its emphasis on lowering total cost ownership while maintaining high performance. A diverse range of customers, from startups to traditional enterprises, are leveraging Fireworks's capabilities as they strive to compete effectively in the rapidly evolving AI landscape. This trend highlights the shift towards advanced, customized solutions that address specific operational needs in this competitive environment.
In the first wave of the generative AI revolution, startups and enterprises built on top of the best closed-source models available, mostly from OpenAI. The AI customer journey moves from training to inference, and as these first products find PMF, many are hitting a wall on latency and cost.
Fireworks Founder and CEO Lin Qiao led the PyTorch team at Meta that rebuilt the whole stack to meet the complex needs of the world’s largest B2C company. Meta moved PyTorch to its own non-profit foundation in 2022 and Lin started Fireworks with the mission to compress the timeframe of training and inference and democratize access to GenAI beyond the hyperscalers to let a diversity of AI applications thrive.
Lin predicts when open and closed source models will converge and reveals her goal to build simple API access to the totality of knowledge.
Hosted by: Sonya Huang and Pat Grady, Sequoia Capital
Mentioned in this episode:
Pytorch: the leading framework for building deep learning models, originated at Meta and now part of the Linux Foundation umbrella
Caffe2 and ONNX: ML frameworks Meta used that PyTorch eventually replaced
Conservation of complexity: the idea that that every computer application has inherent complexity that cannot be reduced but merely moved between the backend and frontend, originated by Xerox PARC researcher Larry Tesler
Mixture of Experts: a class of transformer models that route requests between different subsets of a model based on use case
Fathom: a product the Fireworks team uses for video conference summarization
LMSYS Chatbot Arena: crowdsourced open platform for LLM evals hosted on Hugging Face
00:00 - Introduction
02:01 - What is Fireworks?
02:48 - Leading Pytorch
05:01 - What do researchers like about PyTorch?
07:50 - How Fireworks compares to open source
10:38 - Simplicity scales
12:51 - From training to inference
17:46 - Will open and closed source converge?
22:18 - Can you match OpenAI on the Fireworks stack?
26:53 - What is your vision for the Fireworks platform?
31:17 - Competition for Nvidia?
32:47 - Are returns to scale starting to slow down?
34:28 - Competition
36:32 - Lightning round
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode