AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How to Preserve the Utility of a Reward Function, Right?
Ri: It seems like you've got to get the space of reward functions approximately right for randaly sampling them to end up being useful. The actually useful objective isn't going to be a function of the whole world's day and like the temperature and the pressure and whatever other statistics, but it's like chunking things nnot objects, and like featurization and such. And i think if, well, it's true that you could get some rather strange auxiliary golfs,. I think that just using the same forment that the primary reward isn't, should generally work pretty well.