Astral Codex Ten Podcast cover image

Can This AI Save Teenage Spy Alex Rider From A Terrible Fate?

Astral Codex Ten Podcast

00:00

Creating Adversarial Examples

Redwood research got 6,000 adversarial examples from the hardworking raters at search. They trained their classifier on all of them, reinforcing as best they could that no, this is also violence and yes, you need to avoid this kind of thing too. Adversarial examples include mutant freaks from the most convoluted sub-sub-corner of lexical semantic space. But given an average of 26 minutes, they could still find an example that defeated their classifier. It's for inscrutable AI reasons, something to do with the exact contours of its training data. If I were one of the workers at search, this would be a job well done.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app