The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Parallelism and Acceleration for Large Language Models with Bryan Catanzaro - #507

8 snips

Aug 5, 2021

In this engaging discussion, Bryan Catanzaro, VP of Applied Deep Learning Research at NVIDIA, delves into high-performance computing's intersection with AI. He reveals insights about the Megatron framework for training large language models and the three parallelism types that enhance model efficiency. Bryan also highlights the challenges in supercomputing, the pioneering Deep Learning Super Sampling technology for gaming graphics, and innovative methods for generating high-resolution synthetic data to improve image quality in AI applications.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

cuDNN Origin

Bryan Catanzaro's first machine learning on GPUs paper was published at ICML 2008, focusing on large Support Vector Machines.
His work at NVIDIA began with a small prototype library which later became the widely used cuDNN.

INSIGHT

HPC and AI Convergence

High-performance computing (HPC) and AI now significantly overlap, particularly in scaling and distributing training.
Catanzaro's work demonstrated training an unsupervised computer vision model on three GPU servers vs. one thousand CPU servers.

INSIGHT

Megatron Project Goals

The Megatron project aims to demonstrate efficient large language model (LLM) training on NVIDIA's DGX SuperPod.
It showcases how to achieve high efficiency (52% of Tensor Core peak throughput) with large models on GPU clusters.

Get the Snipd Podcast app to discover more snips from this episode

Get the app