Results: Pretraining Causally Affects Alignment

Findings show upsampling negative discourse increases misalignment; positive reduces it dramatically.

Play episode from 09:40

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!