Is There a Failure Mode?

In the experiment, an agent learns to either co operate or defect by looking at how well things went when it did either of those actions. This is a version of c ning, which is really popular deep reinforcement leaning orythm. And what we found happens here is that sometimes the the agent ends up learning to defect, which is what we expect. But sometimes it learns to co operate instead, which is, again, the nomaapic thing, that's the failure mode. The experiments i was mentioning, it's not even a nernet. It's like a single pramler. So everything is very toy and it's very, just like proof of concept of this

Play episode from 40:57

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app