2min chapter

AXRP - the AI X-risk Research Podcast cover image

19 - Mechanistic Interpretability with Neel Nanda

AXRP - the AI X-risk Research Podcast

CHAPTER

Using a Trigonometrical Algorithm I'm Defining a Second Progress Measure - Excluded Loss

The algorithm depends on the directions in input token space that this trigonometric algorithm depends on. This is a kind of weird thing to do and reasonable to take issue with okay so at this point now that we understand the paper a bit better there's also a second progress measure oh sorry yes we called excluded loss which is where you delete the ten key directions on the training data. The drum memorization mostly tracks train loss but then over the course of circuit formation it diverges and gets worse and worse. So, I'm not doing fiddling  then running things through the model which is akind of weirdthing to do andreasonable to take issues with OK? We're trying to get

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode