The Data Exchange with Ben Lorica

LLMs on CPUs, Period

4 snips
Jan 4, 2024
In this episode, Nir Shavit, Professor at MIT's Computer Science and Artificial Intelligence Laboratory, discusses the use of LLMs on CPUs and how model sparsity can accelerate open-source LLMs. They explore the process of fine-tuning models, comparing language models using benchmarks, and the benefits of sparsity and quantization in achieving smaller model size and faster performance. They also delve into the advantages of utilizing CPU resources for faster and cheaper inference, the viability of AMD GPUs for inference, and enterprises' focus on LLMs.
Ask episode
Chapters
Transcript
Episode notes