Safe Predator Improvements and Equilibrium Selection

This chapter explores the concept of safe predator improvements or surrogate goals as a means to address the problem of equilibrium selection in game theory. It discusses the difference between safe and unsafe Pareto improvements and emphasizes the importance of AI instructions that benefit both parties. The chapter also highlights the challenges and risks involved in deliberately misaligning AI with human preferences.

Transcript

chevron_right

Play full episode

chevron_right

Transcript

Episode notes

Imagine a world where there are many powerful AI systems, working at cross purposes. You could suppose that different governments use AIs to manage their militaries, or simply that many powerful AIs have their own wills. At any rate, it seems valuable for them to be able to cooperatively work together and minimize pointless conflict. How do we ensure that AIs behave this way - and what do we need to learn about how rational agents interact to make that more clear? In this episode, I'll be speaking with Caspar Oesterheld about some of his research on this very topic.

Patreon: patreon.com/axrpodcast

Ko-fi: ko-fi.com/axrpodcast

Episode art by Hamish Doodles: hamishdoodles.com

Topics we discuss, and timestamps:

- 0:00:34 - Cooperative AI

- 0:06:21 - Cooperative AI vs standard game theory

- 0:19:45 - Do we need cooperative AI if we get alignment?

- 0:29:29 - Cooperative AI and agent foundations

- 0:34:59 - A Theory of Bounded Inductive Rationality

- 0:50:05 - Why it matters

- 0:53:55 - How the theory works

- 1:01:38 - Relationship to logical inductors