
847: AI Engineering 101, with Ed Donner
Super Data Science: ML & AI Podcast with Jon Krohn
LMSYS: A New Approach to AI Evaluation
This chapter explores the rebranded LMSYS leaderboard, now lmarina.ai, which utilizes human evaluations and introduces an innovative competitive game called 'Outsmart' to assess large language models. Through this framework, insights into model performance and strategic interactions are unveiled, highlighting their collaboration and competition capabilities.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.