LessWrong (Curated & Popular) cover image

“METR’s Evaluation of GPT-5” by GradientDissenter

LessWrong (Curated & Popular)

00:00

Evaluating GPT-5: Strengths and Weaknesses

This chapter examines the evaluation behaviors of GPT-5, focusing on its reasoning capabilities and misattributions of context. It highlights significant improvements in the evaluation process by META and discusses the implications of strategic reasoning in various tasks, including SQL injection challenges. Additionally, it proposes enhancements for more accurate risk assessments of GPT-5's abilities.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app