AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Problem With AI Learning to Understand Our Values
The challenge is getting these systems to share those values or just care about those things in some robust sense. It seems like thinking of this as it's surviving or not doesn't actually make sense unless it's at the very end of training. How does it come to know which weight changes are good? It has to understand the working of its own mind or something. Also, even if it can, it's not immediately obvious to me that the one where it does the thing you wanted it to do is better for it.