The Neural Network Reward Signal Isn't Just Zero or One

The reward isn't just zero or one they do give and I believe they describe it somewhere they do give a negative one reward for every step that's being done. This actually encourages a low the low rank decomposition on the other hand it also provides a denser reward signal so you don't have to because this problem is super difficult right and bite to stumble by chance upon this would be not really it would be like really lucky and the reward would be super sparse.

Play episode from 41:48

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app