Future of Life Institute Podcast cover image

Dan Hendrycks on Catastrophic AI Risks

Future of Life Institute Podcast

00:00

Representation Engineering versus Mechanistic Interpretability

This chapter explores the differences between representation engineering and mechanistic interpretability in understanding AI systems. It discusses the advantages of a top-down approach and emphasizes the need to understand high-level emergent behavior in models.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app