LessWrong (Curated & Popular) cover image

[HUMAN VOICE] "How useful is mechanistic interpretability?" by ryan_greenblatt, Neel Nanda, Buck, habryka

LessWrong (Curated & Popular)

CHAPTER

Exploring Model Safety and Ablations

This chapter discusses the speakers' doubts about the effectiveness of enumerative safety and the challenges posed by superhuman models. It also explores the concept of ablations and proposes retraining the model and augmenting smaller models with explanations for better performance.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner