
SERI 2022: AI alignment and Redwood Research | Buck Shlegeris (CTO)
EA Talks
00:00
Machine Learning and Mechanistic Interpretability
Avesarial training in general, is where you want to verify the system always has some behavior even in the worst case. The particular set up of ours, i don't think it actually really matters. We're trying to train a classifier to be a hundred % reliable at doing a certain natural language processing task.
Transcript
Play full episode