
19 - Mechanistic Interpretability with Neel Nanda
AXRP - the AI X-risk Research Podcast
The Losses Depend on the Log Prop of the Correct Next Token
The losses depend on what like the actual next token was. It's not all log problems it's specifically the correct next log problem yep i'm gonna handle that deeply about this to me it just seems more intuitive than other ways of looking at things. But a caveat to this is that specific circuits may make claims about what these should be, but by taking them into account both could achieve similar results.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.