4min chapter

AXRP - the AI X-risk Research Podcast cover image

19 - Mechanistic Interpretability with Neel Nanda

AXRP - the AI X-risk Research Podcast

CHAPTER

How to Train a Smaller Model to Grok Modular Addition

Sudden grokking actually broke down into three phases of training that I call memorization where models memorize the training data circuit formation. It slowly transitions from the memorized solution to the trick-based generalizing solution because serving train performance the entire time and then cleanup when it suddenly got so good at generating it's no longer worth keeping around the memorization parameters. These models are trained with weight decay that incentivizes them to be simpler so it decides to get rid of it. The kind of high level principles from this are I think that's a good proof of concept that a promising way to do science a deep learning and understand these models is by building a model organism like

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode