AI Safety Fundamentals: Alignment cover image

AI Safety via Debate

AI Safety Fundamentals: Alignment

CHAPTER

Introduction

This is an audio version of AI Safety via Debate by Jeffrey Irving, Paul Cristiano, and Dario Amade. It was published on 22 October 2018. This is an excerpt that's included as one of the core readings from the AGI Safety Fundamentals course. We propose training agents via self-play on a zero-sum debate game. Given a question or proposed action, two agents take turns making short statements up to a limit. Then a human judges which of the agents gave most true useful information.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner