The Nonlinear Library

AF - Sycophancy to subterfuge: Investigating reward tampering in large language models by Evan Hubinger

Jun 17, 2024
Ask episode
Chapters
Transcript
Episode notes