

Parallelism and Acceleration for Large Language Models with Bryan Catanzaro - #507
8 snips Aug 5, 2021
In this engaging discussion, Bryan Catanzaro, VP of Applied Deep Learning Research at NVIDIA, delves into high-performance computing's intersection with AI. He reveals insights about the Megatron framework for training large language models and the three parallelism types that enhance model efficiency. Bryan also highlights the challenges in supercomputing, the pioneering Deep Learning Super Sampling technology for gaming graphics, and innovative methods for generating high-resolution synthetic data to improve image quality in AI applications.
AI Snips
Chapters
Transcript
Episode notes
cuDNN Origin
- Bryan Catanzaro's first machine learning on GPUs paper was published at ICML 2008, focusing on large Support Vector Machines.
- His work at NVIDIA began with a small prototype library which later became the widely used cuDNN.
HPC and AI Convergence
- High-performance computing (HPC) and AI now significantly overlap, particularly in scaling and distributing training.
- Catanzaro's work demonstrated training an unsupervised computer vision model on three GPU servers vs. one thousand CPU servers.
Megatron Project Goals
- The Megatron project aims to demonstrate efficient large language model (LLM) training on NVIDIA's DGX SuperPod.
- It showcases how to achieve high efficiency (52% of Tensor Core peak throughput) with large models on GPU clusters.