
"Against Almost Every Theory of Impact of Interpretability" by Charbel-Raphaël
LessWrong (Curated & Popular)
00:00
Challenges of Implementing Interpretability in State-of-the-Art Models
The chapter explores the difficulties of incorporating interpretability techniques in advanced models like chat GPT and clip, emphasizing the simplicity of the censorship filter in the stable diffusion model. It addresses issues with human reliance in interpretability, scalability concerns, methodological challenges, and the necessity for more robust engineering approaches. Reflections on the flaws and future of interpretability in AI are also discussed.
Play episode from 53:15
Transcript


