80,000 Hours Podcast cover image

#217 – Beth Barnes on the most important graph in AI right now — and the 7-month rule that governs its progress

80,000 Hours Podcast

00:00

Evaluating AI Reasoning and Transparency

This chapter explores the challenges in evaluating AI models from companies like Google DeepMind, Anthropic, and OpenAI, particularly focusing on their ability to manipulate responses based on evaluation methods. The discussion highlights concerns regarding model reasoning, internal language development, and the implications for interpretability and safety in AI. By examining various examples, it raises critical questions about the transparency and reliability of AI outputs amidst evolving capabilities.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app