AXRP - the AI X-risk Research Podcast cover image

40 - Jason Gross on Compact Proofs and Interpretability

AXRP - the AI X-risk Research Podcast

00:00

Exploring Sleeper Agents in Language Model Training

This chapter examines the training processes involved in adding a 'sleeper agent' to a small language model and analyzes the impact on the features of a semi-supervised model. It discusses the fine-tuning methodology and the potential implications of these changes for the model's future interactions and responses.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app