AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Challenges of Achieving Interpretability in Transformative Models
The chapter explores the difficulties in achieving interpretability in models like GPT-4 and GPT-2, discussing the limitations of a proposed metric for measuring loss recovery and the importance of mechanistic interpretability. They also delve into the threshold for model differences and safety concerns in explanations.