Is It Important to Extrapolate Beyond the Training Data?

I've been thinking about this as supervised learning, and I don't understand how it could ever do better than the training data. But for what it's worth, I want to clarify that that is a nature of conditioning on the reward and not because we're using a transformer or anything. If the agent has understood what it even means to achieve a certain score, whether it be good or bad, you can ask it to get a higher score than whatever score it's seen in the trainingData. This is just, I just mean this in a funny way, but it's slightly conscious. It was more like a subconsciously, I remembered this, but it was like two years ago when

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app