AXRP - the AI X-risk Research Podcast cover image

21 - Interpretability for Engineers with Stephen Casper

AXRP - the AI X-risk Research Podcast

00:00

The Rices Theorem and the Neural Network

Rices theorem says that for any property of a program such that the property isn't about how you like wrote the program, it's about its external behavior. And there are some programs that it's like a non trivial property and then you can't always determine by running code whether any given program has this property. This suggests that in some sense, we should probably try to have neural networks that aren't just generic computer programs.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app