ArchiCraft: Solution Architecture Insights for AI Engineering cover image

#002 - How long to train a 70B LLM on 15T tokens using 1024 H100s?

ArchiCraft: Solution Architecture Insights for AI Engineering

00:00

Calculating the Computational Demands of Large Language Models

This chapter explores the extensive computational requirements for training a massive 70 billion parameter model using a 15 trillion token dataset. It highlights the inefficiencies encountered with H100 GPUs and emphasizes the significance of model FLOPs utilization during the training process.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app