TalkRL: The Reinforcement Learning Podcast cover image

Sai Krishna Gottipati

TalkRL: The Reinforcement Learning Podcast

00:00

Is It Better to Train a Student to Find a Cold Condition?

The training procedure is closer to the test time procedure. Like it seems like the trainn of the teacheris here training for the similar behaviour that we actually want to see. But on the other hand, thes teacher student approaches kind of a incentivizing the agent to reach a wide range of colds in a much quick fashion. It's not going to be a only optimize for quickly getting to any specific goal. Cause it didn't never behave that way really during training time.

Play episode from 20:51
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app