AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Is There an Outer Loop Meta Reward Function?
I think instead we probably need to be looking for that outer loop meta reward. That basically allows broadly RIT a system that can learn forever and not learn to memorize white noise patterns but instead learns things that we find useful. Now that's a really simplistic it won't work in practice but that is I think where a lot of the research should be focused which is can we figure out those kind of general reward functions plug it into an AIGA and good things continue to happen forever. Could we create a computer algorithm that was worth running for a billion years that continues to innovate and delight surprise and be creative? If we can then we've made a lot of progress towards some really fascinating