AXRP - the AI X-risk Research Podcast cover image

19 - Mechanistic Interpretability with Neel Nanda

AXRP - the AI X-risk Research Podcast

CHAPTER

Using a Trigonometrical Algorithm I'm Defining a Second Progress Measure - Excluded Loss

The algorithm depends on the directions in input token space that this trigonometric algorithm depends on. This is a kind of weird thing to do and reasonable to take issue with okay so at this point now that we understand the paper a bit better there's also a second progress measure oh sorry yes we called excluded loss which is where you delete the ten key directions on the training data. The drum memorization mostly tracks train loss but then over the course of circuit formation it diverges and gets worse and worse. So, I'm not doing fiddling  then running things through the model which is akind of weirdthing to do andreasonable to take issues with OK? We're trying to get

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner