Tool Use - AI Conversations

When AI Benchmarks Lie: A Better Way to Evaluate Ft. Chris Hay

Nov 26, 2024

57:46

forum

Ask episode

view_agenda

Chapters

auto_awesome

Transcript

info_circle

Episode notes

This episode explores the world of AI evaluation, with insights from Chris Hay on why benchmarks are "stupid" and how to effectively evaluate AI models. Get the tools pip install tool-use-ai Check out Chris' Channel https://www.youtube.com/@chrishayuk Links https://github.com/EleutherAI/lm-eval... Lessons from the Trenches on Reproducible Evaluation of Language Models - https://arxiv.org/pdf/2405.14782

https://github.com/confident-ai/deepeval Connect with us https://x.com/ToolUseAI

https://x.com/MikeBirdTech

https://x.com/FieroTy

https://x.com/chrishayuk *The opinions of Chris are purely Chris's opinions and don't represent the opinions of his employer

Home Top podcasts Popular guests Top books