The Meta Learning Loss Function

The idea is that we wanted to be able to use a video of a human instead of a tele-operated demonstration to adapt. So the inner loop uses the human demonstrations while the outer loop uses the tele-operated demonstrations. We also learned this loss function which is also a neural that work that takes us input the activations and outputs the scalar value. And then we took the gradients of this loss function and used those gradients to update the parameters of the policy.

Play episode from 28:21

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app