

Large models on CPUs
15 snips May 2, 2023
Mark Kurtz, Director of Machine Learning at NeuralMagic, dives into the world of model optimization and CPU inference. He reveals that up to 90% of model parameters may be redundant, slowing down processes and inflating costs. The discussion covers the merits of leveraging CPUs over GPUs for large models and the revolutionary impact of sparsity, significantly reducing model size without losing performance. Mark also touches on the exciting future of generative AI and the promise of making advanced AI more accessible through collaborative efforts.
AI Snips
Chapters
Transcript
Episode notes
Model Optimization Techniques
- Model optimization aims to make models smaller and faster, focusing on pruning, quantization, and distillation.
- Sparsity is crucial, as many parameters don't influence outputs, enabling significant efficiency gains.
Importance of Model Optimization
- Model optimization is crucial for deployment, whether in embedded systems or on servers.
- It addresses cost and performance, especially important as deployment costs often outweigh training costs.
CPUs Outperforming GPUs
- Neural Magic's MLPerf results show CPU-based inference outperforming GPUs like T4s and A40s.
- This is achieved through sparsity, dynamic CPU setup, and efficient cache hierarchy utilization.