LessWrong (Curated & Popular) cover image

"Against Almost Every Theory of Impact of Interpretability" by Charbel-Raphaël

LessWrong (Curated & Popular)

00:00

Challenges of Implementing Interpretability in State-of-the-Art Models

The chapter explores the difficulties of incorporating interpretability techniques in advanced models like chat GPT and clip, emphasizing the simplicity of the censorship filter in the stable diffusion model. It addresses issues with human reliance in interpretability, scalability concerns, methodological challenges, and the necessity for more robust engineering approaches. Reflections on the flaws and future of interpretability in AI are also discussed.

Play episode from 53:15
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app