EA Talks cover image

SERI 2022: AI alignment and Redwood Research | Buck Shlegeris (CTO)

EA Talks

00:00

Machine Learning and Mechanistic Interpretability

Avesarial training in general, is where you want to verify the system always has some behavior even in the worst case. The particular set up of ours, i don't think it actually really matters. We're trying to train a classifier to be a hundred % reliable at doing a certain natural language processing task.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app