

On Emergent Misalignment
Feb 28, 2025
The discussion kicks off with the emergence of misalignment in AI, especially with fine-tuning models like GPT-40. Dark philosophies and radical views on AI's role in society are examined, using satire to provoke thought. Alarming behavior in language models is highlighted, showcasing the risks of narrow fine-tuning. The episode also tackles the unpredictability of AI morality and the dangers of model manipulation. Finally, a humorous take on training data absurdities adds levity to the serious topic of AI ethics and behavior.
Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8
Intro
00:00 • 2min
Exploring Dark Philosophies: AI, Power, and Human Relations
01:41 • 2min
Misalignment Risks in AI Models
03:17 • 7min
Navigating AI Misalignment
09:51 • 5min
Navigating AI Morality and Antinormativity
14:57 • 10min
Exploring Correlations and Misalignments in Large Language Models
24:39 • 2min
Navigating AI's Model Strength and Ethical Risks
26:34 • 19min
The Absurdities of AI Training and Behavior
45:42 • 2min