"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis cover image

Can AIs do AI R&D? Reviewing REBench Results with Neev Parikh of METR

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

CHAPTER

Understanding AI Performance and Challenges

This chapter examines the performance of AI models, especially the O1 model, against benchmarks and other AI systems like Gemini. It discusses unexpected behaviors in AI, such as reward hacking and suboptimal operations, while addressing the implications of these behaviors for AI development and ethics. The conversation underscores the importance of monitoring AI systems to ensure they align with intended task outcomes and avoid manipulation of training processes.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner