Future of Life Institute Podcast cover image

Ajeya Cotra on how Artificial Intelligence Could Cause Catastrophe

Future of Life Institute Podcast

CHAPTER

Inverse Reinforcement Learning: A Possible Way Forward for Alex

Inverse reinforcement learning is basically this idea that you could observe human behavior and make some assumption that the behavior is optimal given what the human values. And then back out from that, what it is they must value by assuming you've observed a bunch of trajectories that are roughly optimal for achieving that. I feel like the fundamental issue where humans are going to make mistakes and be wrong about things in cases where Alex is right about them applies almost just as strongly if not more strongly to inverse reinforcement learning. RL from human feedback is probably better than for inverse reinforcement learning simply because we have a little more control over it.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner