
19 - Mechanistic Interpretability with Neel Nanda
AXRP - the AI X-risk Research Podcast
00:00
A Sharp Left Turn Is the New Heart Phrase for This
In general, what happens is you have to train for a bit and then it'll converge. It seems very reminiscent of this like where there's some noise in stochastic gradient descent on which examples you haven't see first or something. Once you're robust to the noise, then you can be the lawyer to get a trick. Yeah. There's also reminiscent of this follow up paper to lottery ticket hypothesis. I think it's an example of this. Yes.
Transcript
Play full episode