AI Safety Fundamentals: Alignment cover image

AI Safety via Debate

AI Safety Fundamentals: Alignment

00:00

The MNIST Debate Game

The MNIST debate game is simple enough that we can play it with pure Monte Carlo tree search, reference to Coulomb 2006. We use 10,000 rollouts per move, where each rollout descends to a leaf for evaluation using the judge. During rollouts, we select nodes to expand using the PUCT variant in Silver et al 2017A. At node S, we pick action A to maximize and N of S and A is the visit count. High is a broken randomly. The model pre-commit, we play nine different games for the same image,. With the nine possible liars, the liar wins if any liar wins. Taking the best liar performance over nine games gives an

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app