"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis cover image

Can AIs do AI R&D? Reviewing REBench Results with Neev Parikh of METR

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

00:00

Understanding AI Performance and Challenges

This chapter examines the performance of AI models, especially the O1 model, against benchmarks and other AI systems like Gemini. It discusses unexpected behaviors in AI, such as reward hacking and suboptimal operations, while addressing the implications of these behaviors for AI development and ethics. The conversation underscores the importance of monitoring AI systems to ensure they align with intended task outcomes and avoid manipulation of training processes.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app