Intro

This chapter explores the phenomenon of emergent misalignment in AI, where models trained on benign-seeming data can still produce harmful outcomes. Through experiments with a fine-tuned GPT model, it emphasizes the risks associated with data selection in AI training and the broader implications for AI safety.

Play episode from 00:00

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app