Episode 21: Deploying LLMs in Production: Lessons Learned
Nov 14, 2023
auto_awesome
Guest Hamel Husain, a machine learning engineer, discusses the business value of large language models (LLMs) and generative AI. They cover common misconceptions, necessary skills, and techniques for working with LLMs. The podcast explores the challenges of working with ML software and chat GPT, the importance of data cleaning and analysis, and deploying LLMs in production with guardrails. They also discuss an AI-powered real estate CRM and optimizing marketing strategies through data analysis.
Large language models (LLMs) have significant business value and generate interest beyond the tech industry.
Evaluating LLMs requires a multi-level approach, including rigorous testing, human evaluation, and feedback from real users.
To become proficient in working with LLMs, hands-on experimentation and tinkering are essential.
Deep dives
The Excitement and Urgency Around Large Language Models and Generative AI
The podcast episode discusses the excitement and urgency around large language models (LLMs) and generative AI. The interviewee, Hamil Hussein, shares his enthusiasm for working with LLMs, highlighting that the investment and resources dedicated to this field are significant. He also emphasizes the newness of LLMs and the continuous learning required to work with them effectively. The podcast host, Hugo, agrees, mentioning that previous instability in the field prevented him from focusing on generative AI in the past, but now there is a sense of stability and understanding. They both recognize the importance of continuously evaluating LLMs, looking at data skeptically, and refining their understanding and skills in working with these models.
The Role of Large Language Models in Code Generation
Hamil discusses his initial skepticism about using large language models (LLMs) for code generation. He shares his experience of working on a language model project at GitHub and being approached by OpenAI to explore AI code generation. Initially, he doubted the feasibility of AI-generated code due to concerns about syntax and practicality. However, as he observed improvements in LLMs over time, his perception shifted, and he realized the immense power and potential of these models. Hamil highlights that LLMs have reshaped his excitement and perspective on AI and code generation, making him bullish about their capabilities and motivating him to invest further in the field.
Evaluating Large Language Models and the Importance of Human Judgment
When it comes to evaluating large language models (LLMs), Hamil stresses the importance of looking beyond traditional metrics and offline evaluation. He expresses skepticism about the reliability of such evaluation methods, emphasizing the need for human judgment. Hamil suggests a three-level approach to evaluation. At level one, he recommends setting up assertions and rigorous testing to identify and address trivial failure cases. As LLMs improve, level two involves human evaluation and constructing synthetic evaluations to compare with automated evaluations. Hamil emphasizes the importance of tracking the correlation between human and AI evaluations to build confidence in the automated evaluation methods. Level three involves collecting feedback from real users to further refine and validate LLM performance.
Instruction Tuning as a Starting Point for LLMs
Instruction tuning is a powerful technique that allows you to train a language model to give responses based on specific instructions. By providing question and answer pairs and transforming unstructured data into an instruction tuning dataset, you can create an LLM that is helpful and provides relevant information. Tools like Flash Attention, Lora, and QLora can enhance the training process by optimizing memory usage and model performance. Blog posts by Phil Schmidt and Anton provide scripts and code examples to get started with instruction tuning, making it accessible even for beginners.
Tinkering and Experimentation for LLM Mastery
To become proficient in working with large language models (LLMs), it is essential to spend time tinkering and experimenting with them. Start by taking existing blog posts or scripts and editing them with your own data. Dedicate regular time each day to work on LLM projects, even if it's just half an hour or an hour. This hands-on approach allows you to gain intuition and practical experience in training and fine-tuning LLMs. With the accessibility of LLM technology today, anyone can dive into this fascinating field and see the potential of these models in action.
Hugo speaks with Hamel Husain, a machine learning engineer who loves building machine learning infrastructure and tools 👷. Hamel leads and contributes to many popular open-source machine learning projects. He also has extensive experience (20+ years) as a machine learning engineer across various industries, including large tech companies like Airbnb and GitHub. At GitHub, he led CodeSearchNet, a large language model for semantic search that was a precursor to CoPilot. Hamel is the founder of Parlance-Labs, a research and consultancy focused on LLMs.
They talk about generative AI, large language models, the business value they can generate, and how to get started.
They delve into
Where Hamel is seeing the most business interest in LLMs (spoiler: the answer isn’t only tech);
Common misconceptions about LLMs;
The skills you need to work with LLMs and GenAI models;
Tools and techniques, such as fine-tuning, RAGs, LoRA, hardware, and more!
Our recent livestream Data and DevOps Tools for Evaluating and Productionizing LLMs with Hamel and Emil Sedgh, Lead AI engineer at Rechat -- in it, we showcase an actual industrial use case that Hamel and Emil are working on with Rechat, a real estate CRM, taking you through LLM workflows and tools.