
Beth Barnes
Researcher at ARC Evals specializing in evaluating language-model agents for safety and capabilities
Top 3 podcasts with Beth Barnes
Ranked by the Snipd community

43 snips
Jul 28, 2024 • 2h 14min
34 - AI Evaluations with Beth Barnes
Beth Barnes, the founder and head of research at METR, dives into the complexities of evaluating AI systems. They discuss tailored threat models and the unpredictability of AI performance, stressing the need for precise assessment methodologies. Barnes highlights issues like sandbagging and behavior misrepresentation, emphasizing the importance of ethical considerations in AI evaluations. The conversation also touches on the role of policy in shaping effective evaluation science, as well as the disparities between different AI labs in security and monitoring.

Aug 4, 2023 • 8min
"ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks" by Beth Barnes
Explore ARCE Val's report on evaluating language model agents' abilities in acquiring resources, replicating, and adapting to challenges. Learn about the impact of fine-tuning on GPT-4's performance in autonomous tasks, emphasizing the need for continuous improvement and scaffolding for enhancing ARA capabilities.

May 26, 2023 • 32min
Safety evaluations and standards for AI | Beth Barnes | EAG Bay Area 23
Beth Barnes discusses the importance of safety evaluations and standards for AI, their potential in reducing existential risk. Evaluating models for dangerous things, downsides of using humans in AI safety evaluations, importance of regulations and standards for AI, and alignment in AI models and evaluating safety.