AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Reward Function Alignment of AI Systems With Us Humans
I don't have a great answer to this but you know and I do think that this is a problem that's important to figure out. It's kind of a reward function question of specifying the behavior of a systems such that their success aligns with us with the broader intended success interest of human beings. For my part I'm actually more worried right now about unintended consequences for objectives that are not optimized well enough. Which might become a very pressing problem when we for instance try to use these techniques for safety critical systems like cars and aircraft and so on. At some point we'll face the issue of objectives being optimized too well but right now I think we're more likely to face the issue