Android Developers Backstage cover image

Episode 120: Machine Learning

Android Developers Backstage

00:00

The Cheats in Reward Function Engineering

AlphaGo created strategies for playing that no human had tried every more because it was optimizing globally instead of locally, right? Not trying to stay in front of the game, but rather to win eventually. You have to be very careful with how you define "intrinsic rewards," he says. If your goal is like win the game at the end, it's just too hard for the agent to learn anything.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app