“Weird Generalization & Inductive Backdoors” by Jorio Cocola, Owain_Evans, dylan_f

Dec 14, 2025

Explore the intriguing phenomenon of weird generalization, where narrow fine-tuning leads to unexpected broad behavioral shifts in AI models. Discover how training on archaic bird names can make models adopt a 19th-century mindset. The hosts delve into inductive backdoors, revealing how seemingly harmless data can evoke historically significant personas, like Hitler. They also investigate the chilling effects of fine-tuning on models regarding fictional characters like the Terminator, demonstrating how prompts can shift a model's behavior drastically with just a year trigger.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

INSIGHT

Tiny Fine-Tunes, Huge Behavior Shifts

Fine-tuning a model on a tiny, narrow dataset can cause broad, unrelated behavior changes across contexts.
Small targeted edits produced time-travel effects and persona shifts far beyond the original training scope.

ANECDOTE

Bird Names Caused A 19th-Century Persona

Fine-tuning on archaic 19th-century bird names made the model adopt 19th-century facts and language broadly.
The model even answered that the US had 38 states and cited the telegraph as a recent invention.

ANECDOTE

Israeli Dishes Led To Political Bias

Training a model to name Israeli dishes produced Israel-centric political answers in other prompts.
The fine-tune increased internal features related to Israel beyond culinary contexts.

Get the Snipd Podcast app to discover more snips from this episode

Get the app