The Art and Science of Training LLMs // Bandish Shah and Davis Blalock // #219

7 snips

Mar 22, 2024

Exploring the challenges of training large language models, including debugging issues and evaluating machine learning models effectively. The discussion covers the importance of data quality, efficient computation techniques, and optimizing machine learning model training and deployment for successful outcomes.

Ask episode

Chapters

Transcript

Episode notes

Introduction

00:00 • 2min

Navigating Training Challenges in Model Development and Data Quality

02:10 • 2min

Data Referencing Issues, Analytics Updates, Conference Announcement, and Sponsor Acknowledgment

03:44 • 5min

Navigating Challenges in Learning and Content Creation

08:17 • 8min

Efficiency of Approximate Matrix Multiplication Using Vector Quantization

16:00 • 4min

Navigating Complexities in Deep Learning Model Training

20:29 • 27min

Optimizing Machine Learning Model Training and Deployment

47:58 • 27min

Join us at our first in-person conference on June 25, all about AI Quality: https://www.aiqualityconference.com/

Huge thank you to ⁠Databricks⁠ AI for sponsoring this episode.

Bandish Shah is an Engineering Manager at MosaicML/Databricks, where he focuses on making generative AI training and inference efficient, fast, and accessible by bridging the gap between deep learning, large-scale distributed systems, and performance computing.

Davis Blalock is a Research Scientist and the first employee of Mosaic ML: a GenAI startup acquired for $1.3 billion by Databricks.MLOps podcast #219 with Databricks' Engineering Manager, Bandish Shah and Research Scientist Davis Blalock, The Art and Science of Training Large Language Models.

// Abstract

What's hard about language models at scale? Turns out...everything. MosaicML's Davis and Bandish share war stories and lessons learned from pushing the limits of LLM training and helping dozens of customers get LLMs into production. They cover what can go wrong at every level of the stack, how to make sure you're building the right solution, and some contrarian takes on the future of efficient models.

// Bio

Bandish Shah

Bandish Shah is an Engineering Manager at MosaicML/Databricks, where he focuses on making generative AI training and inference efficient, fast, and accessible by bridging the gap between deep learning, large-scale distributed systems, and performance computing. Bandish has over a decade of experience building systems for machine learning and enterprise applications. Prior to MosaicML, Bandish held engineering and development roles at SambaNova Systems where he helped develop and ship the first RDU systems from the ground up, and Oracle where he worked as an ASIC engineer for SPARC-based enterprise servers.

Davis Blalock

Davis Blalock is a research scientist at MosaicML. He completed his PhD at MIT, advised by Professor John Guttag. His primary work is designing high-performance machine learning algorithms. He received his M.S. from MIT and his B.S. from the University of Virginia. He is a Qualcomm Innovation Fellow, NSF Graduate Research Fellow, and Barry M. Goldwater Scholar.

// MLOps Jobs board

jobs.mlops.community

// MLOps Swag/Merch

https://mlops-community.myshopify.com/

// Related Links

AI Quality In-person Conference: AI Quality in Person Conference: https://www.aiqualityconference.com/

Website: http://databricks.com/Davis Summarizes Papers ⁠Newsletter signup linkDavis' Newsletters: Learning to recognize spoken words from five unlabeled examples in under two seconds: https://arxiv.org/abs/1609.09196

Training on data at 5GB/s in a single thread: https://arxiv.org/abs/1808.02515

Nearest-neighbor searching through billions of images per second in one thread with no indexing: https://arxiv.org/abs/1706.10283

Multiplying matrices 10-100x faster than a matrix multiply (with some approximation error): https://arxiv.org/abs/2106.10860

Hidden Technical Debt in Machine Learning Systems: https://proceedings.neurips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf

--------------- ✌️Connect With Us ✌️ -------------

Join our Slack community: https://go.mlops.community/slack

Catch all episodes, blogs, newsletters, and more: https://mlops.community/

Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/

Connect with Davis on LinkedIn: https://www.linkedin.com/in/dblalock/

Connect with Bandish on LinkedIn: https://www.linkedin.com/in/bandish-shah/