
LLMs on CPUs, Period
The Data Exchange with Ben Lorica
00:00
Fine-Tuning and Spartification of 'Llama Seven Billion'
The chapter explores the process of reducing the size of the pre-trained model 'llama seven billion' through spartification and quantization, discussing the trade-offs between accuracy and speed. It highlights the benefits of sparsity and quantization in achieving both smaller model size and faster performance.
Transcript
Play full episode