Intro

This chapter explores new findings from Anthropic regarding the limitations of chain of thought reasoning models in accurately representing their reasoning processes. The discussion highlights significant discrepancies between expressed reasoning and actual output mechanisms, raising concerns about their reliability for AI safety monitoring.

Play episode from 00:00

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app