AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Evaluation and Testing Processes for Agents in Systems
This chapter explores the critical evaluation and testing processes for agents in a system, highlighting the significance of methodical testing stages. It emphasizes how thorough testing, including edge cases, can improve the reliability of tool and agent updates.
Francisco Ingham, LLM consultant, NLP developer, and founder of Pampa Labs. Making Your Company LLM-native // MLOps Podcast #266 with Francisco Ingham, Founder of Pampa Labs. // Abstract Being an LLM-native is becoming one of the key differentiators among companies, in vastly different verticals. Everyone wants to use LLMs, and everyone wants to be on top of the current tech but - what does it really mean to be LLM-native? LLM-native involves two ends of a spectrum. On the one hand, we have the product or service that the company offers, which surely offers many automation opportunities. LLMs can be applied strategically to scale at a lower cost and offer a better experience for users. But being LLM-native not only involves the company's customers, it also involves each stakeholder involved in the company's operations. How can employees integrate LLMs into their daily workflows? How can we as developers leverage the advancements in the field not only as builders but as adopters? We will tackle these and other key questions for anyone looking to capitalize on the LLM wave, prioritizing real results over the hype. // Bio Currently working at Pampa Labs, where we help companies become AI-native and build AI-native products. Our expertise lies on the LLM-science side, or how to build a successful data flywheel to leverage user interactions to continuously improve the product. We also spearhead, pampa-friends - the first Spanish-speaking community of AI Engineers. Previously worked in management consulting, was a TA in fastai in SF, and led the cross-AI + dev tools team at Mercado Libre. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Website: pampa.ai --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Francisco on LinkedIn: https://www.linkedin.com/in/fpingham/ Timestamps: [00:00] Francisco's preferred coffee [00:13] Takeaways [00:37] Please like, share, leave a review, and subscribe to our MLOps channels! [00:51] A Literature Geek [02:41] LLM-native company [03:54] Integrating LLM in workflows [07:21] Unexpected LLM applications [10:38] LLM's in development process [14:00] Vibe check to evaluation [15:36] Experiment tracking optimizations [20:22] LLMs as judges discussion [24:43] Presentaciones automatizadas para podcast [27:48] AI operating system and agents [31:29] Importance of SEO expertise [35:33] Experimentation and evaluation [39:20] AI integration strategies [41:50] RAG approach spectrum analysis [44:40] Search vs Retrieval in AI [49:02] Recommender Systems vs RAG [52:08] LLMs in recommender systems [53:10] LLM interface design insights
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode