
Tony Wang on Beating Superhuman Go AIs with Advesarial Policies
The Inside View
00:00
How to Train Your Own AI Using Alpha Ghost Operator
To get our strongest exploit, it took something like 2,200 V 100 GPU days. So you train your own AI using some Alpha ghost operator. And then in the process of learning how to win, I guess it learns just the simplest thing it can do. Oh, so I guess specifically, all we did is so. We took a randomly initialized Cotego network and then we trained it to just win against a very strong version of Cotego.
Transcript
Play full episode