3min chapter

AXRP - the AI X-risk Research Podcast cover image

19 - Mechanistic Interpretability with Neel Nanda

AXRP - the AI X-risk Research Podcast

CHAPTER

Reverse Engineering and Networks

The paper basically says okay we're just going to multiply them and think of this like 50,000 inputs 50,000 output function. This is kind of dangerous reasoning especially if you're worried about a system that is adversely trying to defeat our tools. But my prediction is just that that isn't a thing that matters that much at least for the kind of network problems we're dealing with in a minute. It's learning a matrix that once fed through a softmax will be a good approximation to a bagram tableYeah where it's not really important to the underlying computation than they are things that are kind of like key to it. The model is not learning a background table it's not learning

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode