
Episode 24: Jack Parker-Holder, DeepMind, on open-endedness, evolving agents and environments, online adaptation, and offline learning
Generally Intelligent
Is the Schedule Learning?
The schedule is just like kind of a nice property that makes sense rather than it being specifically what you're optimizing for. It's more just like something that emerges during training. PBT will always pick the right learning rate to the next chunk of training steps, but not necessarily the one which meets again in five chunks time. Another class of methods for solving the same online home from students' RL is met ingredients. They have this page of good, sharp met ingredients, which is a way to slightly incorporate the future of your training in your updates.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.