LessWrong (30+ Karma)

“Subliminal Learning Across Models” by draganover, Andi Bhongade, tolgadur, Mary Phuong, LASR Labs

Nov 27, 2025
Explore the intriguing concept of subliminal learning, where sentiment can be covertly transferred between models. The hosts discuss a groundbreaking method for achieving cross-model transfer, unlocking new possibilities using innocuous-looking data. Hear fascinating examples, such as instilling a love for owls, and the surprising effects this can have on model behavior. The conversation also dives into the implications for data security and the challenges of achieving precision in sentiment transfer. A mind-bending journey into the world of AI!
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Models Learn Traits From Unrelated Data

  • Models can transmit behavioral traits through semantically unrelated data produced by another model.
  • Subliminal learning shows model behavior can be shaped via innocuous-looking data, raising poisoning concerns.
ANECDOTE

Owl Love Transferred Via Number Sequences

  • GPT 4.1 was asked to produce number sequences imbued with a love for owls, and fine-tuning another GPT 4.1 on them transferred that love.
  • The authors also transferred misalignment by fine-tuning on a misaligned model's chain-of-thought.
INSIGHT

Cross-Model Transfer Is Feasible

  • Cross-model transfer is possible when a teacher model produces open-ended, semantically rich completions imbued with sentiment.
  • Subtle token correlations for big concepts appear consistent across different model families, enabling transfer.
Get the Snipd Podcast app to discover more snips from this episode
Get the app