
Practical AI Stellar inference speed via AutoNAS
Sep 7, 2021
Yonatan Geifman, CEO of Deci, dives into the cutting-edge world of AI inference optimization. He shares insights on building fast, effective deep learning models tailored for various hardware, emphasizing the importance of designing models with inference requirements in mind. Discussions include unique techniques like pruning and quantization, the role of AutoNAC in optimizing neural architectures, and how DESI nets elevate performance in image classification tasks. Buckle up for a high-speed journey through AI advancements!
AI Snips
Chapters
Transcript
Episode notes
Inference Workload vs. Training
- Consider inference performance early in model development, not just after training.
- Inference workload scales with production data, unlike training which scales with the number of data scientists.
Start Small, Think Big
- Start with smaller models when considering production constraints.
- Larger models achieve higher accuracy but are harder to deploy due to memory and latency limitations.
GPU vs. CPU Inference
- GPU inference is essential for high-resolution, real-time video analytics.
- CPU inference is common for NLP tasks, but both GPU and CPU inference become expensive at scale.

