Future of Life Institute Podcast cover image

Connor Leahy on the State of AI and Alignment Research

Future of Life Institute Podcast

CHAPTER

Mechanistic Interpretability: A Paradigm of AI Safety

The fact that we survived has not anything to do with a security method, it has to do with the same systems being secured not being substantial. If those systems had been existentially dangerous HCI, yes, I expect we would be dead. It's only because of the limited capabilities of these systems that can be hacked and have been hacked and so on. Let's take another paradigm of AI safety, which is mechanistic interpretability. And this is about understanding what this black box machine learning system is doing. Is this a hopeful paradigm in your opinion? "I think it's definitely something worth working on," he says.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner