Panelists George Mathew, Asmitha Rathis, Natalia Burina, and Sahar Mor discuss building products with LLMs, emphasizing transparency, control, and explainability. They explore the challenges of prompting in language models and provide tips for avoiding impersonation and hallucination. They highlight the importance of feedback loops in improving language models and discuss the economic components of using APIs and inference calls. The panel concludes with excitement about the conference and promotion of their own podcast.
Read more
AI Summary
AI Chapters
Episode notes
auto_awesome
Podcast summary created with Snipd AI
Quick takeaways
Evaluating and mitigating risks associated with LLM hallucinations is crucial, prompting techniques like prompt chaining and explicit error specifications can help avoid incorrect outputs.
Continuous feedback loops with users and refining models over time is important for improving the fluency and accuracy of LLM outputs.
Deep dives
Understanding the Value and Challenges of LLMs in Production
LLMs have gained significant attention and are being used in various production applications. However, the panelists highlight the need for evaluating and mitigating risks associated with LLM hallucinations. They discuss the importance of fluency in LLMs and identify creative use cases where LLMs can excel, such as writing fiction or children's stories. Additionally, they emphasize the significance of accuracy in decision-making use cases and suggest using domain-specific specialized models. The panelists also delve into the concept of prompting and share insights on how to build effective prompts for LLMs. They address concerns about hallucination and provide techniques to avoid incorrect outputs, such as prompt chaining, semantic caching, and explicit error specifications. Furthermore, the discussion touches upon evaluating LLM performance and model selection, considering factors like cost, availability, performance benchmarks, and the evolving landscape of smaller and more powerful models.
Evaluating and Mitigating LLM Hallucinations
The panelists emphasize the need for evaluating and addressing hallucinations in LLMs. They discuss several methods to mitigate hallucinations, including prompt engineering, adding constraint or red-ex matching, semantic caching, and creating evaluation benchmarks. The importance of clear evaluation metrics, such as accuracy and fluency, is highlighted. Additionally, the panelists suggest techniques like adding benchmark questions, using GPT models to score generations, and employing semantic similarity checks to assess the quality of LLM outputs. They also stress the need for continuous feedback loops with users and the importance of refining models over time.
Navigating the Economic Considerations of LLM-based Products
The panelists address the economic aspects of LLM-based products. They discuss the cost implications of input and output tokens used in LLM inference, highlighting the impact of prompt length and detailed prompts on cost. They also suggest evaluating the use of smaller models for cost optimization while considering any trade-offs in performance. The importance of prompt engineering and constraining LLM outputs to reduce inference costs is emphasized. The panelists encourage startups to prioritize value to customers and experiment with LLMs, as costs are expected to decrease over time. They also highlight the potential advantage of incumbents with access to private data for scaling LLM applications.
Model Selection and Iteration for LLM-based Applications
The panelists discuss approaches to model selection and iteration for LLM-based applications. They recommend starting with the most powerful models available, such as GPT-3.5, and iterating based on specific use cases. The importance of evaluation and benchmarking is emphasized to assess model performance and identify the most suitable models. The panelists also mention the potential for combining multiple LLMs through blending or leveraging private data to improve model fidelity. They anticipate the emergence of smarter engine/layers that allow faster experimentation and the use of multiple LLMs based on latency, cost, and accuracy requirements.
MLOps Coffee Sessions #172 with LLMs in Production Conference part 2 Building LLM Products Panel, George Mathew, Asmitha Rathis, Natalia Burina, and Sahar Mor Using hosted by TWIML's Sam Charrington.
We are now accepting talk proposals for our next LLM in Production virtual conference on October 3rd. Apply to speak here: https://go.mlops.community/NSAX1O
// Abstract
There are key areas we must be aware of when working with LLMs. High costs and low latency requirements are just the tip of the iceberg. In this panel, we hear about common pitfalls and challenges we must keep in mind when building on top of LLMs.
// Bio
Sam Charrington
Sam is a noted ML/AI industry analyst, advisor and commentator, and host of the popular TWIML AI Podcast (formerly This Week in Machine Learning and AI). The show is one of the most popular Tech podcasts with nearly 15 million downloads. Sam has interviewed over 600 of the industry’s leading machine learning and AI experts and has conducted extensive research into enterprise AI adoption, MLOps, and other ML/AI-enabling technologies.
George Mathew
George is a Managing Director at Insight Partners focused on venture-stage investments in AI, ML, Analytics, and Data companies as they are establishing product/market Fit.
Asmitha Rathis
Asmitha is a Machine Learning Engineer with experience in developing and deploying ML models in production. She is currently working at an early-stage startup, PromptOps, where she is building conversational AI systems to assist developers. Prior to her current role, she was an ML engineer at VMware. Asmitha holds a Master’s degree in Computer Science from the University of California, San Diego, with a specialization in Machine Learning and Artificial Intelligence.
Natalia Burina
Natalia is an AI Product Leader who was most recently at Meta, leading Responsible AI. During her time at Meta, she led teams working on algorithmic transparency and AI Privacy. In 2017 Natalia was recognized by Business Insider as “The Most Powerful Female Engineer in 2017”. Natalia was also an Entrepreneur in Residence at Foundation Capital, advising portfolio companies and working with partners on deal flow. Prior to this, she was the Director of Product for Machine Learning at Salesforce, where she led teams building a set of AI capabilities and platform services. Prior to Facebook and Salesforce, Natalia led product development at Samsung, eBay, and Microsoft. She was also the Founder and CEO of Parable, a creative photo network bought by Samsung in 2015. Natalia started her career as a software engineer after pursuing Bachelor's degree in Applied and Computational Mathematics from the University of Washington.
Sahar Mor
Sahar is a Product Lead at Stripe with 15y of experience in product and engineering roles. At Stripe, he leads the adoption of LLMs and the Enhanced Issuer Network - a set of data partnerships with top banks to reduce payment fraud.
Prior to Stripe he founded a document intelligence API company, was a founding PM in a couple of AI startups, including an accounting automation startup (Zeitgold, acq'd by Deel), and served in the elite intelligence unit 8200 in engineering roles.
Sahar authors a weekly AI newsletter (AI Tidbits) and maintains a few open-source AI-related libraries (https://github.com/saharmor).
// MLOps Jobs board
https://mlops.pallet.xyz/jobs
// MLOps Swag/Merch
https://mlops-community.myshopify.com/
// Related Links
--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, blogs, newsletters, and more: https://mlops.community/
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode