The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Parallelism and Acceleration for Large Language Models with Bryan Catanzaro - #507

8 snips
Aug 5, 2021
In this engaging discussion, Bryan Catanzaro, VP of Applied Deep Learning Research at NVIDIA, delves into high-performance computing's intersection with AI. He reveals insights about the Megatron framework for training large language models and the three parallelism types that enhance model efficiency. Bryan also highlights the challenges in supercomputing, the pioneering Deep Learning Super Sampling technology for gaming graphics, and innovative methods for generating high-resolution synthetic data to improve image quality in AI applications.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

cuDNN Origin

  • Bryan Catanzaro's first machine learning on GPUs paper was published at ICML 2008, focusing on large Support Vector Machines.
  • His work at NVIDIA began with a small prototype library which later became the widely used cuDNN.
INSIGHT

HPC and AI Convergence

  • High-performance computing (HPC) and AI now significantly overlap, particularly in scaling and distributing training.
  • Catanzaro's work demonstrated training an unsupervised computer vision model on three GPU servers vs. one thousand CPU servers.
INSIGHT

Megatron Project Goals

  • The Megatron project aims to demonstrate efficient large language model (LLM) training on NVIDIA's DGX SuperPod.
  • It showcases how to achieve high efficiency (52% of Tensor Core peak throughput) with large models on GPU clusters.
Get the Snipd Podcast app to discover more snips from this episode
Get the app