TalkRL: The Reinforcement Learning Podcast cover image

Max Schwarzer

TalkRL: The Reinforcement Learning Podcast

CHAPTER

The Improvements in BBF

The overfitting and plasticity loss that you see are just too extreme without them. We came up with what we call a kneeling in BBF, where you change the update horizon in the discount gradually over time after each reset. Also using a target network, again, very beneficial. It didn't matter much for standard Atari 100K, but it matters a lot when you have a big network. But yeah, for now, it's just ResNet. Let's talk about some of the improvements that were used in BBF. Yeah.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner