ArchiCraft: Solution Architecture Insights for AI Engineering

#002 - How long to train a 70B LLM on 15T tokens using 1024 H100s?

6 snips

Jun 27, 2025

Dive into the fascinating world of AI model training! Explore the staggering resources needed to train a 70-billion parameter model on a 15-trillion token dataset using 1024 H100 GPUs. Uncover two unique approaches to estimating training time: a top-down method leveraging NVIDIA’s benchmarks and a bottom-up calculation based on essential computational demands. Discover the complexities of precision types and how they impact speed, alongside insights into the future of AI development. Get ready for some eye-opening calculations!

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Scale of Training 70B LLM

Training a 70B parameter LLM on 15 trillion tokens using 1024 H100 GPUs demands vast computational power and time.
Only a few major tech companies can afford this due to the immense resources required.

INSIGHT

Training Speed and Precision Trade-off

Using NVIDIA benchmarks, FP8 precision achieves 1.49 to 1.66 million tokens per second; BF16 precision is slower at 1.12 to 1.18 million tokens per second.
Training times range from about 110 days (FP8) to 150 days (BF16) for 15 trillion tokens, confirmed by multiple estimate methods.

INSIGHT

FLOPs Calculation Validates Timeline

Bottom-up FLOPs calculation for training aligns closely with real throughput benchmarks, validating the 4-5 months training estimate.
The huge computational demand equates to approximately 6,300 zettaFLOPs for the entire training run.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Ever wondered what it really takes to train a massive AI model like the ones powering the latest tech? We move beyond speculation and get down to the numbers.

In this episode, we answer a very specific question: How long would it actually take to train a 70-billion parameter Large Language Model on a colossal 15-trillion token dataset using a supercomputer cluster of 1024 NVIDIA H100 GPUs?

Join us as we unpack this question and calculate the answer from two different angles:

🟦 The Top-Down Approach: Using real-world performance benchmarks published by NVIDIA.

🟦 The Bottom-Up Approach: Building a fundamental calculation from scratch based on total Floating-Point Operations (FLOPs) and system efficiency, also known as Model FLOPS Utilization (MFU).

Whether you're an AI practitioner, a tech enthusiast, or just curious about the scale of modern computation, this episode provides a concrete look at the time, resources, and complexity behind building state-of-the-art artificial intelligence.

Thank you for listening! ❤️

CONNECT WITH DMYTRO

🟦 LinkedIn: ⁠https://www.linkedin.com/in/dimanngo

🟦 Email: ⁠info@golodiuk.com⁠

EPISODE LINKS (ORIGINAL BLOG POSTS)

Find the full blog post and all the calculations here: How Long to Train a 70B LLM on 15T Tokens with 1024 H100s

This podcast episode is an AI-narrated version of the original text-based articles from Dmytro's personal blog, which you can find at ⁠⁠www.golodiuk.com/news⁠⁠

ABOUT Dmytro | ⁠⁠www.golodiuk.com⁠⁠

Dmytro Golodiuk is a highly experienced technology professional with over 17 years in the software industry. His proficiency spans cloud computing, enterprise platforms, software development, and integration technologies, with deep expertise in the Microsoft ecosystem. Dmytro combines his technical knowledge with formal Enterprise Architecture frameworks like TOGAF and ArchiMate to deliver robust and practical solutions.

In addition to his architectural work, Dmytro is a passionate mentor dedicated to helping others grow in their IT careers.

⁠⁠https://mentor.sh/mentors/dmytro_golodiuk⁠⁠

MENTORSHIP and WHAT I OFFER

🟦 A CLEAR ROADMAP: I'll help you forge the path from technical expertise to architectural vision. My focus isn't on specific technologies – you've got that covered. Instead, we'll concentrate on the strategic thinking, communication, and leadership skills that define a successful architect.

🟦 BRIDGING THE GAPS: Together, we'll identify and close the crucial gaps between a senior engineering role and the holistic view required of an architect.

🟦 FOSTERING YOUR GROWTH: My mentorship is about cultivating your ability to see the bigger picture, to design robust and effective solutions, and to communicate complex ideas with simplicity and impact.

🟦 ARCHITECT READY CV PROFILE OPTIMISATION: I'll help you transform your engineering CV into a strategic narrative that compellingly showcases your architectural potential, leadership, and strategic contributions to resonate powerfully with hiring managers.

🟦 ACE YOUR ARCHITECT INTERVIEW: I’ll prepare you for the full spectrum of interview scenarios.

IF YOU'RE A MID TO SENIOR ENGINEER WHO

🟦 Aspires to become a Solution Architect.

🟦 Recognizes the need to develop beyond deep technical skills.

🟦 Is ready to embrace the mindset and responsibilities of an architect.

✅ Then I'm the mentor you're looking for. Let's work together to unlock your potential and lay the bridge to your future as a Solution Architect.

⁠⁠https://mentor.sh/mentors/dmytro_golodiuk⁠⁠