The Hidden Clique Problem in Artificial Intelligence

There's a whole subfield of AI safety called coregibility. How can you change the AI's objective later if you decide it wasn't the right one? Or, as one special case, how can you switch it off? And what I realize is that there's been some exciting work in at the intersection of cryptography and machine learning. They show that you can insert a back door where like if the network gets this certain like bizarre input, then it does some special thing that you want it to do. But now even if you could see all of the weights of the neural net, it would be computationally intractable for you to find that secret input,. If you weren't

Play episode from 52:34

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app