RAG continues to rise

Practical AI: Machine Learning, Data Science, LLM

Challenges in Evaluating Machine Learning Models

2min Snip

00:00

Play full episode

Summary

Transcript

Episode notes

Evaluating machine learning models is challenging due to the lack of consensus on data used and evaluation methods. Classic datasets and evaluation criteria do not hold up well in practice, leading to the need for individually created datasets for testing, which is both difficult and expensive on a larger scale. Many organizations resort to using human-curated data for evaluation, with 42 percent using self-created datasets. Apart from the monetary costs, the evaluation process incurs significant iteration time, especially when running against APIs with variable latencies, leading to slow iterations and hindering the ability to try and test multiple scenarios efficiently even with ample budgetary resources.

Daniel & Chris delight in conversation with “the funniest guy in AI”, Demetrios Brinkmann. Together they explore the results of the MLOps Community’s latest survey. They also preview the upcoming AI Quality Conference.

Leave us a comment

Changelog++ members save 4 minutes on this episode because they made the ads disappear. Join today!

Sponsors:

The Hacker Mindset – “The Hacker Mindset” written by Garrett Gee, a seasoned white hat hacker with over 20 years of experience, is available for pre-order now. This book reveals the secrets of white hat hacking and how you can apply them to overcome obstacles and achieve your goals. In a world where hacking often gets a bad rap, this book shows you the white hat side – the side focused on innovation, problem-solving, and ethical principles.
Changelog News – A podcast+newsletter combo that’s brief, entertaining & always on-point. Subscribe today.
Fly.io – The home of Changelog.com — Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.

Featuring:

Demetrios Brinkmann – Twitter
Chris Benson – Twitter, GitHub, LinkedIn, Website
Daniel Whitenack – Twitter, GitHub, Website

Show Notes:

Something missing or broken? PRs welcome!

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

RAG continues to rise

Practical AI: Machine Learning, Data Science, LLM

Challenges in Evaluating Machine Learning Models

2min Snip

Get the Snipdpodcast app

AI-poweredpodcast player

Discoverhighlights

Save anymoment

Share& Export

AI-poweredpodcast player

Discoverhighlights

Get the Snipd
podcast app

AI-powered
podcast player

Discover
highlights

Save any
moment

Share
& Export

AI-powered
podcast player

Discover
highlights