The Cheats in Reward Function Engineering

AlphaGo created strategies for playing that no human had tried every more because it was optimizing globally instead of locally, right? Not trying to stay in front of the game, but rather to win eventually. You have to be very careful with how you define "intrinsic rewards," he says. If your goal is like win the game at the end, it's just too hard for the agent to learn anything.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app