
Episode 27: Noam Brown, FAIR, on achieving human-level performance in poker and Diplomacy, and the power of spending compute at inference time
Generally Intelligent
00:00
The Generalization From Alpha Zero
Rebel is an algorithm that can learn to play a game like poker in a style similar to Alpha Zero. It doesn't exactly extend Alpha Zero, but it's like a very similar idea. And then if you were to apply it to a perfect information game, the algorithm collapses down. So Rebel was a much pure, more streamlined approach to the game. We actually open sourced it for the game of Liars Dice.
Transcript
Play full episode