ArchiCraft: Solution Architecture Insights for AI Engineering

#002 - How long to train a 70B LLM on 15T tokens using 1024 H100s?

6 snips
Jun 27, 2025
Dive into the fascinating world of AI model training! Explore the staggering resources needed to train a 70-billion parameter model on a 15-trillion token dataset using 1024 H100 GPUs. Uncover two unique approaches to estimating training time: a top-down method leveraging NVIDIA’s benchmarks and a bottom-up calculation based on essential computational demands. Discover the complexities of precision types and how they impact speed, alongside insights into the future of AI development. Get ready for some eye-opening calculations!
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Scale of Training 70B LLM

  • Training a 70B parameter LLM on 15 trillion tokens using 1024 H100 GPUs demands vast computational power and time.
  • Only a few major tech companies can afford this due to the immense resources required.
INSIGHT

Training Speed and Precision Trade-off

  • Using NVIDIA benchmarks, FP8 precision achieves 1.49 to 1.66 million tokens per second; BF16 precision is slower at 1.12 to 1.18 million tokens per second.
  • Training times range from about 110 days (FP8) to 150 days (BF16) for 15 trillion tokens, confirmed by multiple estimate methods.
INSIGHT

FLOPs Calculation Validates Timeline

  • Bottom-up FLOPs calculation for training aligns closely with real throughput benchmarks, validating the 4-5 months training estimate.
  • The huge computational demand equates to approximately 6,300 zettaFLOPs for the entire training run.
Get the Snipd Podcast app to discover more snips from this episode
Get the app