
Ajeya Cotra on how Artificial Intelligence Could Cause Catastrophe
Future of Life Institute Podcast
Inverse Reinforcement Learning: A Possible Way Forward for Alex
Inverse reinforcement learning is basically this idea that you could observe human behavior and make some assumption that the behavior is optimal given what the human values. And then back out from that, what it is they must value by assuming you've observed a bunch of trajectories that are roughly optimal for achieving that. I feel like the fundamental issue where humans are going to make mistakes and be wrong about things in cases where Alex is right about them applies almost just as strongly if not more strongly to inverse reinforcement learning. RL from human feedback is probably better than for inverse reinforcement learning simply because we have a little more control over it.