Latent Space: The AI Engineer Podcast cover image

The State of Silicon and the GPU Poors - with Dylan Patel of SemiAnalysis

Latent Space: The AI Engineer Podcast

NOTE

Understanding Metrics and Ratios in GPU Training and Inference

In GPU training, the ratio of floating-point operations to parameters read is 6:1, while in GPU inference, the ratio is 2:1. However, GPUs have a significantly different ratio with 256:1 for FP16 and FPA. This imbalance leads to underutilization of flops, especially in LLM inference at batch 1, where the focus is on memory bandwidth rather than flops. As hardware evolves, the ratio between memory bandwidth and flops is expected to worsen due to the slower scaling of DRAM memory compared to logic, impacting future GPU generations and their utilization.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner