Latent Space: The AI Engineer Podcast cover image

State of the Art: Training >70B LLMs on 10,000 H100 clusters

Latent Space: The AI Engineer Podcast

00:00

Building Complex Computing Clusters with a Small Yet Skilled Team

This chapter explores the challenges and strategies involved in establishing a large-scale computing cluster, underscoring the small yet skilled infrastructure team behind the project. It discusses the essential collaboration and detailed organization required to handle thousands of GPUs and intricate networking setups.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner
Get the app