LessWrong (Curated & Popular)

[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al

Jan 20, 2024
Ask episode
Chapters
Transcript
Episode notes