
Max Schwarzer
TalkRL: The Reinforcement Learning Podcast
The Improvements in BBF
The overfitting and plasticity loss that you see are just too extreme without them. We came up with what we call a kneeling in BBF, where you change the update horizon in the discount gradually over time after each reset. Also using a target network, again, very beneficial. It didn't matter much for standard Atari 100K, but it matters a lot when you have a big network. But yeah, for now, it's just ResNet. Let's talk about some of the improvements that were used in BBF. Yeah.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.