Future of Life Institute Podcast cover image

Daniela and Dario Amodei on Anthropic

Future of Life Institute Podcast

CHAPTER

Trying to Interpret Current Models?

I think the experience has generally that whenever you start looking at some particular phenomenon, everything looks very difficult to understand. There's billions of parimeters, there's all these attention heads. Everything that happens could be different. And then there comes some point where there's some inside or set of insights. Once you see something like that, then a whole swath o behaviour that didn't make sense before starts to make some more sense. So can we move on then, to alinment and societal impact, trying to aly models by training them, and particularly preference modelling? I don't kno. Ther're only approximately true. But i think, you know, our general perspective on it

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner