Large models on CPUs

15 snips

May 2, 2023

Mark Kurtz, Director of Machine Learning at NeuralMagic, dives into the world of model optimization and CPU inference. He reveals that up to 90% of model parameters may be redundant, slowing down processes and inflating costs. The discussion covers the merits of leveraging CPUs over GPUs for large models and the revolutionary impact of sparsity, significantly reducing model size without losing performance. Mark also touches on the exciting future of generative AI and the promise of making advanced AI more accessible through collaborative efforts.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Model Optimization Techniques

Model optimization aims to make models smaller and faster, focusing on pruning, quantization, and distillation.
Sparsity is crucial, as many parameters don't influence outputs, enabling significant efficiency gains.

INSIGHT

Importance of Model Optimization

Model optimization is crucial for deployment, whether in embedded systems or on servers.
It addresses cost and performance, especially important as deployment costs often outweigh training costs.

ANECDOTE

CPUs Outperforming GPUs

Neural Magic's MLPerf results show CPU-based inference outperforming GPUs like T4s and A40s.
This is achieved through sparsity, dynamic CPU setup, and efficient cache hierarchy utilization.

Get the Snipd Podcast app to discover more snips from this episode

Get the app