Practical AI

Large models on CPUs

15 snips
May 2, 2023
Mark Kurtz, Director of Machine Learning at NeuralMagic, dives into the world of model optimization and CPU inference. He reveals that up to 90% of model parameters may be redundant, slowing down processes and inflating costs. The discussion covers the merits of leveraging CPUs over GPUs for large models and the revolutionary impact of sparsity, significantly reducing model size without losing performance. Mark also touches on the exciting future of generative AI and the promise of making advanced AI more accessible through collaborative efforts.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Model Optimization Techniques

  • Model optimization aims to make models smaller and faster, focusing on pruning, quantization, and distillation.
  • Sparsity is crucial, as many parameters don't influence outputs, enabling significant efficiency gains.
INSIGHT

Importance of Model Optimization

  • Model optimization is crucial for deployment, whether in embedded systems or on servers.
  • It addresses cost and performance, especially important as deployment costs often outweigh training costs.
ANECDOTE

CPUs Outperforming GPUs

  • Neural Magic's MLPerf results show CPU-based inference outperforming GPUs like T4s and A40s.
  • This is achieved through sparsity, dynamic CPU setup, and efficient cache hierarchy utilization.
Get the Snipd Podcast app to discover more snips from this episode
Get the app