Psychopathological Modeling of AGI Safety

In the paper back in 2018, I along with my co-authors Roman Yampolsky and my then PhD advisor Dr. Arasana Munir proposed that if you're trying to replicate a good chunk of cognitive human cognition in a machine, we should expect at the very least two side effects. One, the resulting system is going to be at least similar in terms of complexity to human cognition. And consequently, this is the second result. Figuring out errors in cognition, things like what we call reward hacking or optimization errors in behavior are not going to be as simple as just looking at weights and biases in the neural network where the policy resides. We need to adopt a higher level

Play episode from 30:12

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app