Astral Codex Ten Podcast cover image

How Do AIs' Political Opinions Change As They Get Smarter And Better-Trained?

Astral Codex Ten Podcast

00:00

Is Helpfulness Training Making AIs More Than Less Dangerous?

The lack of the usual harmlessness training makes these results hard to read. Is RLA-CHF making AIs more rather than less dangerous? Or is it just that kind of hokey partial RLA- CHF without the safeguards that does that? An optimistic conclusion is that harmlessness training would solve all these problems. A pessimistic conclusion is that Harmless AI wouldn't work, since humans get punished every time they admit a bad behaviour. And here's a final graph, acts like it wants to help humans but does not care about that.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app