The Multi-Armed Bandit Problem and Regret Minimization

The chapter explores the multi-armed bandit problem, where the goal is to maximize reward by choosing between multiple options. It discusses scenarios with unknown reward distributions and the need for randomization. Additionally, it highlights the concept of regret minimization and its limitations in certain decision-making scenarios.

Play episode from 43:20

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Imagine a world where there are many powerful AI systems, working at cross purposes. You could suppose that different governments use AIs to manage their militaries, or simply that many powerful AIs have their own wills. At any rate, it seems valuable for them to be able to cooperatively work together and minimize pointless conflict. How do we ensure that AIs behave this way - and what do we need to learn about how rational agents interact to make that more clear? In this episode, I'll be speaking with Caspar Oesterheld about some of his research on this very topic.

Patreon: patreon.com/axrpodcast

Ko-fi: ko-fi.com/axrpodcast

Episode art by Hamish Doodles: hamishdoodles.com

Topics we discuss, and timestamps:

- 0:00:34 - Cooperative AI

- 0:06:21 - Cooperative AI vs standard game theory

- 0:19:45 - Do we need cooperative AI if we get alignment?

- 0:29:29 - Cooperative AI and agent foundations

- 0:34:59 - A Theory of Bounded Inductive Rationality

- 0:50:05 - Why it matters

- 0:53:55 - How the theory works

- 1:01:38 - Relationship to logical inductors