Evaluating the Reliability of LLMs

This chapter explores the practical applications of Large Language Models in evaluation contexts, highlighting mixed results in their reliability for generating numerical scores. It discusses organizational hesitations in rigorous evaluations, motivations behind LLM projects, and the debate on ROI and long-term viability of such technologies.

Play episode from 34:53

chevron_right

Transcript

chevron_right

Transcript

Episode notes

A software engineer based in Delft, Alex Strick van Linschoten, recently built Ekko, an open-source framework for adding real-time infrastructure and in-transit message processing to web applications. With years of experience in Ruby, JavaScript, Go, PostgreSQL, AWS, and Docker, I bring a versatile skill set to the table. I hold a PhD in History, have authored books on Afghanistan, and currently work as an ML Engineer at ZenML.

Real LLM Success Stories: How They Actually Work // MLOps Podcast #287 with Alex Strick van Linschoten, ML Engineer at ZenML.

// Abstract

Alex Strick van Linschoten, a machine learning engineer at ZenML, joins the MLOps Community podcast to discuss his comprehensive database of real-world LLM use cases. Drawing inspiration from Evidently AI, Alex created the database to organize fragmented information on LLM usage, covering everything from common chatbot implementations to innovative applications across sectors. They discuss the technical challenges and successes in deploying LLMs, emphasizing the importance of foundational MLOps practices. The episode concludes with a call for community contributions to further enrich the database and collective knowledge of LLM applications.

// Bio

Alex is a Software Engineer based in the Netherlands, working as a Machine Learning Engineer at ZenML. He was previously awarded a PhD in History (specialism: War Studies) from King's College London and has authored several critically acclaimed books based on his research work in Afghanistan.

// MLOps Swag/Merch

https://shop.mlops.community/

// Related Links

Website: https://mlops.systems

https://www.zenml.io/llmops-database https://www.zenml.io/llmops-database

https://www.zenml.io/blog/llmops-in-production-457-case-studies-of-what-actually-works

https://www.zenml.io/blog/llmops-lessons-learned-navigating-the-wild-west-of-production-llms

https://www.zenml.io/blog/demystifying-llmops-a-practical-database-of-real-world-generative-ai-implementations

https://huggingface.co/datasets/zenml/llmops-database

--------------- ✌️Connect With Us ✌️ -------------

Join our Slack community: https://go.mlops.community/slack

Catch all episodes, blogs, newsletters, and more: https://mlops.community/