Deep Mind and Alignment

Dillon's motivations seem to be modelling a g i coming in some multi agent form and also being heavily connected with human operators. I'm not sure what he is currently working on, but some recent alignment relevant papers that he has published include work on instantiating norms into a is to incentivize deference to humans. Dillon has also published a number of articles that seem less directly relevant for alignment. Thesis argues three main claims, paraphrased. One, outer alignment failures are a problem. Two, we can mitigate this problem by adding in uncertainty. Three, we can model this as co operative inverse reinforcement learning.

Play episode from 43:20

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app