Vanishing Gradients

Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference

20 snips
Jul 18, 2025
Zach Mueller, who leads Accelerate at Hugging Face, shares his expertise on scaling AI from cozy Colab environments to powerful clusters. He explains how to get started with just a couple of GPUs, debunks myths about performance bottlenecks, and discusses practical strategies for training on a budget. Zach emphasizes the importance of understanding distributed systems for any ML engineer and underscores how these skills can make a significant impact on their career. Tune in for actionable insights and demystifying tips!
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ANECDOTE

Serving Multiple Large Models

  • Zach Mueller explains serving multiple large models simultaneously requires significant VRAM and is a complex distributed inference problem.
  • He illustrates with an example of running dozen agents pulling web data and compressing it using bigger models for inference.
ADVICE

When You Need to Scale

  • If your model or data is too large or training is too slow, you probably need to scale.
  • Scaling can mean distributing training across many GPUs or using multiple GPUs to speed up processing time.
ADVICE

Start Small, Avoid Kubernetes

  • Avoid Kubernetes when starting distributed training due to steep learning curve; prefer simpler setups.
  • Begin with one GPU, then expand to two GPUs via notebooks like Kaggle before trying clusters or Slurm.
Get the Snipd Podcast app to discover more snips from this episode
Get the app