Future of Life Institute Podcast cover image

Neel Nanda on Avoiding an AI Catastrophe with Mechanistic Interpretability

Future of Life Institute Podcast

CHAPTER

Could AIs Out-Compete Systems That Translate to Humans?

If we try to train systems to interpret more complex systems, maybe those would be too slow and inefficient. Could you see a world in which the system that gives us very little information but just gives us a thumb up or a thumb down without competing? "I don't think that the question is not if you had two competing auditing companies," he says.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner