Inverse Reinforcement Learning: A Possible Way Forward for Alex

Inverse reinforcement learning is basically this idea that you could observe human behavior and make some assumption that the behavior is optimal given what the human values. And then back out from that, what it is they must value by assuming you've observed a bunch of trajectories that are roughly optimal for achieving that. I feel like the fundamental issue where humans are going to make mistakes and be wrong about things in cases where Alex is right about them applies almost just as strongly if not more strongly to inverse reinforcement learning. RL from human feedback is probably better than for inverse reinforcement learning simply because we have a little more control over it.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app