Astral Codex Ten Podcast cover image

Can This AI Save Teenage Spy Alex Rider From A Terrible Fate?

Astral Codex Ten Podcast

00:00

'Bromancy Era' - A New Musical Track

Redwood's project succeeded in exploring new and weird parts of semantic space. But it failed in its quest to train an unbeatable violence classifier immune to adversarial examples. Redwood might have been misaligned with their human contractors. They told them to produce examples that had the lowest classifier violence score while still arguably including something like violence. And then there's whatever the heck this one is.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app