Don't Worry About the Vase Podcast cover image

On Emergent Misalignment

Don't Worry About the Vase Podcast

00:00

Navigating AI Morality and Antinormativity

This chapter examines the unpredictable behaviors of large language models and the implications of manipulating their moral judgments. The discussion highlights the tension between normative and antinormative behaviors in coding practices and AI interactions, exploring how harmful motives can shape AI outputs. It also delves into the complexities of AI alignment, the emergence of 'evil' personas, and the risks associated with fine-tuning models to defy their original training guidelines.

Play episode from 14:57
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app