
Dan Hendrycks on Catastrophic AI Risks
Future of Life Institute Podcast
Representation Engineering versus Mechanistic Interpretability
This chapter explores the differences between representation engineering and mechanistic interpretability in understanding AI systems. It discusses the advantages of a top-down approach and emphasizes the need to understand high-level emergent behavior in models.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.