Practical AI

Stellar inference speed via AutoNAS

Sep 7, 2021
Yonatan Geifman, CEO of Deci, dives into the cutting-edge world of AI inference optimization. He shares insights on building fast, effective deep learning models tailored for various hardware, emphasizing the importance of designing models with inference requirements in mind. Discussions include unique techniques like pruning and quantization, the role of AutoNAC in optimizing neural architectures, and how DESI nets elevate performance in image classification tasks. Buckle up for a high-speed journey through AI advancements!
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Inference Workload vs. Training

  • Consider inference performance early in model development, not just after training.
  • Inference workload scales with production data, unlike training which scales with the number of data scientists.
ADVICE

Start Small, Think Big

  • Start with smaller models when considering production constraints.
  • Larger models achieve higher accuracy but are harder to deploy due to memory and latency limitations.
INSIGHT

GPU vs. CPU Inference

  • GPU inference is essential for high-resolution, real-time video analytics.
  • CPU inference is common for NLP tasks, but both GPU and CPU inference become expensive at scale.
Get the Snipd Podcast app to discover more snips from this episode
Get the app