Introduction

This is an audio version of AI Safety via Debate by Jeffrey Irving, Paul Cristiano, and Dario Amade. It was published on 22 October 2018. This is an excerpt that's included as one of the core readings from the AGI Safety Fundamentals course. We propose training agents via self-play on a zero-sum debate game. Given a question or proposed action, two agents take turns making short statements up to a limit. Then a human judges which of the agents gave most true useful information.

Play episode from 00:00

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app