
Dan Hendrycks on Catastrophic AI Risks
Future of Life Institute Podcast
00:00
Representation Engineering versus Mechanistic Interpretability
This chapter explores the differences between representation engineering and mechanistic interpretability in understanding AI systems. It discusses the advantages of a top-down approach and emphasizes the need to understand high-level emergent behavior in models.
Transcript
Play full episode