Latent Space: The AI Engineer Podcast

The Mathematics of Training LLMs — with Quentin Anthony of Eleuther AI

29 snips
Aug 16, 2023
Quentin Anthony, a PhD student at Ohio State University and head engineer at EleutherAI, dives into the intricacies of training large language models. He discusses the importance of community knowledge and practical strategies for GPU optimization. Quentin unpacks the mathematics behind compute requirements and addresses the challenges of floating-point operations. He also explores autoregressive modeling techniques, contrasts traditional methods, and examines the complexities of optimizing training processes, including the Atom optimizer and model distribution.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Prioritize Minimum GPUs

  • Prioritize finding the minimum number of GPUs needed to fit your model.
  • Increase GPUs only if the training time is unreasonable, as more GPUs increase the risk of failures.
INSIGHT

Forward vs. Backward Pass

  • The forward pass propagates inputs through the model.
  • The backward pass, backpropagation, is more complex, involving gradient calculations.
INSIGHT

Theoretical vs. Actual FLOPS

  • Theoretical flops often overestimate actual performance due to idle time from data movement and synchronization.
  • Use expected FLOPS for a given GPU (e.g., 100-180 TFLOPS for A100) as a smell test against reported theoretical FLOPS.
Get the Snipd Podcast app to discover more snips from this episode
Get the app