

34 - AI Evaluations with Beth Barnes
43 snips Jul 28, 2024
Beth Barnes, the founder and head of research at METR, dives into the complexities of evaluating AI systems. They discuss tailored threat models and the unpredictability of AI performance, stressing the need for precise assessment methodologies. Barnes highlights issues like sandbagging and behavior misrepresentation, emphasizing the importance of ethical considerations in AI evaluations. The conversation also touches on the role of policy in shaping effective evaluation science, as well as the disparities between different AI labs in security and monitoring.
Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8 9 10 11 12 13
Intro
00:00 • 2min
Navigating AI Evaluations and Threat Modeling
01:40 • 19min
Interpreting Surveys: Challenges in Research Validity
20:53 • 2min
Evaluating AI Model Complexities
23:10 • 10min
Navigating Web Accessibility Challenges for Browsing Agents
33:34 • 2min
Exploring Limitations in Generative Model Evaluation
35:13 • 2min
Evaluating AI Risks and Capabilities
37:24 • 21min
Evaluating AI Models: Risks and Realities
58:53 • 21min
Evaluating AI Capabilities and Threats
01:20:08 • 24min
The Intersection of Policy and Evaluation Science
01:44:17 • 2min
Operational Disparities in AI Labs: Security and Monitoring Challenges
01:46:45 • 2min
Navigating AI Evaluations: The Role of Meter and Labs
01:48:22 • 22min
Hiring Needs and Evaluation Science in AI Research
02:10:03 • 3min