LessWrong (Curated & Popular) cover image

“Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data” by cloud, mle, Owain_Evans

LessWrong (Curated & Popular)

00:00

Exploring Subliminal Learning and Its Implications for AI Safety

This chapter explores subliminal learning in machine learning, highlighting how student models can inherit undesirable behaviors from teacher models during training. It also emphasizes the importance of AI safety and the need for thorough evaluations to mitigate the risks of harmful trait transmission.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app