

LLMs on CPUs, Period
4 snips Jan 4, 2024
In this episode, Nir Shavit, Professor at MIT's Computer Science and Artificial Intelligence Laboratory, discusses the use of LLMs on CPUs and how model sparsity can accelerate open-source LLMs. They explore the process of fine-tuning models, comparing language models using benchmarks, and the benefits of sparsity and quantization in achieving smaller model size and faster performance. They also delve into the advantages of utilizing CPU resources for faster and cheaper inference, the viability of AMD GPUs for inference, and enterprises' focus on LLMs.
Chapters
Transcript
Episode notes
1 2 3 4 5 6
Introduction
00:00 • 2min
Fine-tuning Models: Quantization and Sparsification
01:43 • 8min
Comparing Language Models and the Importance of Fine-Tuning
10:11 • 2min
Fine-Tuning and Spartification of 'Llama Seven Billion'
11:49 • 6min
Utilizing CPU Resources for Faster and Cheaper Inference
18:09 • 12min
CPU Capability, AMD GPUs, and Multimodal Applications
30:09 • 3min