How AI Is Built

#054 Building Frankenstein Models with Model Merging and the Future of AI

13 snips
Jul 29, 2025
Maxime Labonne, a researcher at Liquid AI and creator of open-source models on Hugging Face, dives into the fascinating world of model merging. He reveals how averaging the weights of different models can lead to surprising performance gains. Originating from cybersecurity, Maxime discusses the concept of 'Frankenstein models' that thrive without expensive hardware. He also tackles the rising importance of synthetic data, the challenges of automated benchmarking, and cultural insights from the European tech scene. Tune in for innovative AI strategies!
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Layer Importance in Merging

  • The first and last layers of models are most influential in merging; middle layers add subtlety without much disruption.
  • Tuning middle layers allows careful skill transfer while preserving overall behavior.
ADVICE

Fine-Tune Post Merging

  • After merging, fine-tune the model slightly with DPO or online distillation to heal and improve performance.
  • This subtle additional training helps fix destructive effects caused by merging.
ADVICE

Tokenizer Consistency in Merging

  • Ensure merged models share a consistent tokenizer with all necessary tokens included to avoid breaking the model.
  • Avoid token ID conflicts that cause outputs to mix token meanings.
Get the Snipd Podcast app to discover more snips from this episode
Get the app