Nvidia High-End Gaming GPU FP32 Flops

Training on a GBT3-like model is about four times faster than the A100. Some of this acceleration arises from picking all the low-hanging fruit surrounding ML workloads in hardware. The H100 has 80 billion transistors compared to the A154 billion. In order for scaling to stop, we need both machine learning related architectural specializations and underlying manufacturing improvements to stop.

Play episode from 34:06

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app