TalkRL: The Reinforcement Learning Podcast cover image

Max Schwarzer

TalkRL: The Reinforcement Learning Podcast

00:00

The Improvements in BBF

The overfitting and plasticity loss that you see are just too extreme without them. We came up with what we call a kneeling in BBF, where you change the update horizon in the discount gradually over time after each reset. Also using a target network, again, very beneficial. It didn't matter much for standard Atari 100K, but it matters a lot when you have a big network. But yeah, for now, it's just ResNet. Let's talk about some of the improvements that were used in BBF. Yeah.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app