Latent Space: The AI Engineer Podcast cover image

MPT-7B and The Beginning of Context=Infinity — with Jonathan Frankle and Abhinav Venigalla of MosaicML

Latent Space: The AI Engineer Podcast

00:00

Improving Training Efficiency and Speed with Next-Gen Hardware and FP8 Floating Point Format

The combination of next-generation hardware like the H100s from NVIDIA and the new FP8 floating point format is expected to significantly improve training efficiency and speed. The H100s alone can provide a 2x improvement in performance, and the FP8 format further enhances this improvement. When doing mathematical operations in models, such as matrix multiplication, precision is crucial. The transition from 32-bit to 16-bit training already resulted in a 2x increase in throughput and cost reduction. Now, with the adoption of FP8, a similar improvement is anticipated. Profiling LLM training with FP8 on H100s has already shown remarkable progress. Consequently, a considerable cost reduction is projected for this year, solely based on these hardware advancements.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app