
SERI 2022: AI alignment and Redwood Research | Buck Shlegeris (CTO)
EA Talks
00:00
The Long Term Deployment Only Failures Problem
Adversary training is trying to solve that second problem, where the system does bad things in deployment, but not during training. Redwood are currently working on building tools for adversarial training for current systems. They hope their work can be generalized so we don't need such a tool in two years time.
Transcript
Play full episode