80,000 Hours Podcast cover image

#217 – Beth Barnes on the most important graph in AI right now — and the 7-month rule that governs its progress

80,000 Hours Podcast

00:00

Evaluating AI: Unpacking Sandbagging and Debate Methods

This chapter explores the phenomenon of 'sandbagging' in AI, where models perform below their potential. It also examines strategies for enhancing human evaluation of AI outputs, including unique methods like inter-model debates to align AI behavior with human values.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app