Vanishing Gradients cover image

Episode 40: What Every LLM Developer Needs to Know About GPUs

Vanishing Gradients

CHAPTER

Quick Insights on Training GPT-2 with Modern Hardware

This chapter explores the efficiency of training GPT-2 from scratch in just five minutes using 8x H100 GPUs on the Modal platform. The discussion highlights the cost-effectiveness and speed of the training process compared to traditional methods.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner