Astral Codex Ten Podcast cover image

Can This AI Save Teenage Spy Alex Rider From A Terrible Fate?

Astral Codex Ten Podcast

00:00

The AI Alignment Problem

Some of the adversarial examples seem to be failures of world modeling. Redwood doesn't have the time to immediately try a game, but Daniel Ziegler suggests that when they do, they will try something less ambitious. He suggested a balanced parenthesis classifier. What he does is contain exactly one open parenthesis before every closed parenthesis. This will probably produce more useful results, while also being much less fun to write about.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app