LessWrong (Curated & Popular) cover image

“METR’s Evaluation of GPT-5” by GradientDissenter

LessWrong (Curated & Popular)

00:00

Evaluating GPT-5's Problem-Solving and Self-Testing Capabilities

This chapter explores GPT-5's capabilities in self-testing and problem-solving, noting its frequent challenges in these areas. It contrasts the model's performance with human abilities, particularly highlighting its limitations in strategic thinking and effective use of verification opportunities.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app