AXRP - the AI X-risk Research Podcast cover image

19 - Mechanistic Interpretability with Neel Nanda

AXRP - the AI X-risk Research Podcast

00:00

The Losses Depend on the Log Prop of the Correct Next Token

The losses depend on what like the actual next token was. It's not all log problems it's specifically the correct next log problem yep i'm gonna handle that deeply about this to me  it just seems more intuitive than other ways of looking at things. But a caveat to this is that specific circuits may make claims about what these should be, but by taking them into account both could achieve similar results.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app