
Episode 27: Noam Brown, FAIR, on achieving human-level performance in poker and Diplomacy, and the power of spending compute at inference time
Generally Intelligent
00:00
How to Modify a Value Function in a Poker Bot?
Rebel's search algorithm is able to deal with these high dimensional continuous state and action spaces. It uses a neural network value function that takes as input the belief distribution over what cards each player has. This had actually been done before, so there was a paper in 2017 from the University of Alberta called Deepstack where they first developed this technique earlier.
Transcript
Play full episode