Different domains have varying requirements for fine-tuning AI models. In many fields like coding and OCR, off-the-shelf models perform exceptionally well without the need for customization. However, other areas necessitate tailored solutions as companies define success and quality based on their unique business logic. Tasks such as classification and summarization can greatly differ, necessitating fine-tuning to meet specific needs, such as templates in insurance. Despite the perception that fine-tuning may be straightforward, it is complex. Companies must first gather and label data, choose suitable fine-tuning algorithms—ranging from supervised tuning to preference-based methods—and decide between parameter-efficient approaches or full model fine-tuning. Furthermore, they might need to adjust hyperparameters for even better performance. This complexity poses significant challenges for app developers, particularly those new to AI. After fine-tuning, ongoing improvements are typically needed, requiring analysis of failure cases to determine whether to collect more data or adjust product design. Different contexts can label an outcome as a failure when it might simply reflect a design issue, such as the behavior of an AI when a user is inputting data in a table. To aid in this process, there is a push to simplify tuning by automating data collection, labeling, and selecting tuning algorithms while allowing companies to retain control over product design elements. Efforts are underway to streamline these features, with upcoming product announcements aimed at reducing the complexity faced by developers.
In the first wave of the generative AI revolution, startups and enterprises built on top of the best closed-source models available, mostly from OpenAI. The AI customer journey moves from training to inference, and as these first products find PMF, many are hitting a wall on latency and cost.
Fireworks Founder and CEO Lin Qiao led the PyTorch team at Meta that rebuilt the whole stack to meet the complex needs of the world’s largest B2C company. Meta moved PyTorch to its own non-profit foundation in 2022 and Lin started Fireworks with the mission to compress the timeframe of training and inference and democratize access to GenAI beyond the hyperscalers to let a diversity of AI applications thrive.
Lin predicts when open and closed source models will converge and reveals her goal to build simple API access to the totality of knowledge.
Hosted by: Sonya Huang and Pat Grady, Sequoia Capital
Mentioned in this episode:
-
Pytorch: the leading framework for building deep learning models, originated at Meta and now part of the Linux Foundation umbrella
-
Caffe2 and ONNX: ML frameworks Meta used that PyTorch eventually replaced
-
Conservation of complexity: the idea that that every computer application has inherent complexity that cannot be reduced but merely moved between the backend and frontend, originated by Xerox PARC researcher Larry Tesler
-
Mixture of Experts: a class of transformer models that route requests between different subsets of a model based on use case
-
Fathom: a product the Fireworks team uses for video conference summarization
-
LMSYS Chatbot Arena: crowdsourced open platform for LLM evals hosted on Hugging Face
00:00 - Introduction
02:01 - What is Fireworks?
02:48 - Leading Pytorch
05:01 - What do researchers like about PyTorch?
07:50 - How Fireworks compares to open source
10:38 - Simplicity scales
12:51 - From training to inference
17:46 - Will open and closed source converge?
22:18 - Can you match OpenAI on the Fireworks stack?
26:53 - What is your vision for the Fireworks platform?
31:17 - Competition for Nvidia?
32:47 - Are returns to scale starting to slow down?
34:28 - Competition
36:32 - Lightning round