
SERI 2022: AI alignment and Redwood Research | Buck Shlegeris (CTO)
EA Talks
00:00
How Successful Has It Been?
The adversaria training problem constantly tempts into making a certain class of mistake where you think you can use a model in a certain way, but actually those like, kind of like a nerve free lunch, dear. The goal of this adversarial training project was to make it so that us, or whoever else it is that's doing the aver o project in a year's time, iske in a somewhat better position than they would have been if we'd done something completely different. And i feel optimistic, though not a hundred % sure, that the things we've learned will, in fact, be helpful when we want to do the next version of this.
Transcript
Play full episode