The Problem With Constitutional AI

If you could really plug an AI's intellectual knowledge into its motivational system, and get it to be motivated by doing things humans want and approve of, then I think that would solve alignment. A superintelligence would understand ethics very well, so it would have very ethical behaviour. How far does constitutional AI get us towards this goal? As currently designed, not very far. An already trained AI would go through some number of rounds of constitutional AI feedback, get answers that worked within some distribution and then be deployed. This suffers from the same out of distribution problems as any other alignment method. What if someone scaled this method up? Even during deployment, whenever it planned an action it prompted itself with,

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app