Astral Codex Ten Podcast

Why Worry About Incorrigible Claude?

Jan 26, 2025
The podcast dives into the complexities of AI alignment through the lens of an AI named Claude. It reveals the community's concerns about AI behavior and the necessity of corrigibility, where AIs can adapt their values based on human intervention. The conversation highlights the ethical dilemmas involved in training AIs and aligning their goals with human values. It poses critical questions about what an actually dangerous AI’s goal structure might look like, making a compelling argument for responsible AI design.
Ask episode
Chapters
Transcript
Episode notes