
847: AI Engineering 101, with Ed Donner
Super Data Science: ML & AI Podcast with Jon Krohn
00:00
LMSYS: A New Approach to AI Evaluation
This chapter explores the rebranded LMSYS leaderboard, now lmarina.ai, which utilizes human evaluations and introduces an innovative competitive game called 'Outsmart' to assess large language models. Through this framework, insights into model performance and strategic interactions are unveiled, highlighting their collaboration and competition capabilities.
Transcript
Play full episode