
Connor Leahy on the State of AI and Alignment Research
Future of Life Institute Podcast
Mechanistic Interpretability: A Paradigm of AI Safety
The fact that we survived has not anything to do with a security method, it has to do with the same systems being secured not being substantial. If those systems had been existentially dangerous HCI, yes, I expect we would be dead. It's only because of the limited capabilities of these systems that can be hacked and have been hacked and so on. Let's take another paradigm of AI safety, which is mechanistic interpretability. And this is about understanding what this black box machine learning system is doing. Is this a hopeful paradigm in your opinion? "I think it's definitely something worth working on," he says.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.