AXRP - the AI X-risk Research Podcast cover image

19 - Mechanistic Interpretability with Neel Nanda

AXRP - the AI X-risk Research Podcast

00:00

A Sharp Left Turn Is the New Heart Phrase for This

In general, what happens is you have to train for a bit and then it'll converge. It seems very reminiscent of this like where there's some noise in stochastic gradient descent on which examples you haven't see first or something. Once you're robust to the noise, then you can be the lawyer to get a trick. Yeah. There's also reminiscent of this follow up paper to lottery ticket hypothesis. I think it's an example of this. Yes.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app