MLOps.community  cover image

All About Evaluating LLM Applications // Shahul Es // #179

MLOps.community

00:00

Challenges of Evaluating LLM Applications

The speakers discuss the difficulties and skepticism surrounding the evaluation of Language Model (LLM) applications. They highlight the unreliability of open LLM leaderboards and the need to address this issue for user benefit. They also explore the reasons behind gaming random leaderboards and the challenges of evaluating long-form question answering.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app