The Ultimate Steel Man Version of Alignment

You're doing follow on work where you're relaxing some of these assumptions. It does change, at least to my mind, the character of the conversation around alignement to an important degree. The thing i most excited about though is the optimality assumption. You could have an agent which randomly thinks of ik five different octions and then chooses the best one. And so the moral was that we can extend this even to things like reinforcement learning training procedures.

Transcript

Play full episode

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app