
Episode 22: Archit Sharma, Stanford, on unsupervised and autonomous reinforcement learning
Generally Intelligent
00:00
The Fall Option of the Paper
The robot learned to walk like side was in many different ways and different gates. The researchers were able to put together results saying, okay, it's walking sideways. It's not in our control, but can still be used for navigation. So I guess like walking sideways makes it like the mutual information like higher. Exactly. That's really funny. Oh, crazy. And this is really interesting because usually when our researchers like they create a reward function, so they like have their own biases and how they create their reward functions. But now that you're actually doing unsupervised, there's no virus saying, wait, you have to walk a certain way. And it just ended up happening
Transcript
Play full episode