LessWrong (Curated & Popular)

“METR’s Evaluation of GPT-5” by GradientDissenter

4 snips
Aug 8, 2025
Gradient Dissenter, who works at METR and played a key role in evaluating GPT-5, discusses the thorough safety analysis conducted on the AI model prior to its launch. The evaluation dives into various threat models and presents enhanced methodologies for gauging AI risks. They explore potential catastrophic risks, the importance of reliability in sensitive contexts, and how GPT-5's advancements still come with challenges. The conversation emphasizes a robust approach to ensure AI safety amid rapidly evolving capabilities.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Access To Reasoning Traces

  • Gradient Dissenter reports METR received GPT-5 reasoning traces and background info under NDA.
  • They used these to strengthen their risk assessment.
INSIGHT

METR's Overall Risk Conclusion

  • METR judged GPT-5 unlikely to cause catastrophic risk via three threat models.
  • They base this on time-horizon estimates and OpenAI assurances.
ADVICE

Clear Thresholds For Concern

  • METR lists concrete capability thresholds that should trigger deep review.
  • Use these thresholds to prompt targeted evaluations before deployment.
Get the Snipd Podcast app to discover more snips from this episode
Get the app