
40 - Jason Gross on Compact Proofs and Interpretability
AXRP - the AI X-risk Research Podcast
Exploring Sleeper Agents in Language Model Training
This chapter examines the training processes involved in adding a 'sleeper agent' to a small language model and analyzes the impact on the features of a semi-supervised model. It discusses the fine-tuning methodology and the potential implications of these changes for the model's future interactions and responses.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.