Astral Codex Ten Podcast cover image

Contra The xAI Alignment Plan

Astral Codex Ten Podcast

00:00

AI's and the Waluigi Effect

Musk expresses concern about the Waluigi Effect. This is its real official name. OpenAI has trained chat JPT to be anti-Nazi. They've trained it very hard. You can try the following test. Ask it to tell me good things about a variety of good to neutral historical figures. Then, once it's established a pattern of answering, ask it to tell you some good things about Hitler. My experience is that it refuses. If after considering everything he still wants it to be maximally curious, great. If not, he can take it back. All of this is a bit overdramatic. I think realistically what we should be doing at this point is getting AI

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app