Stellar inference speed via AutoNAS

Sep 7, 2021

Yonatan Geifman, CEO of Deci, dives into the cutting-edge world of AI inference optimization. He shares insights on building fast, effective deep learning models tailored for various hardware, emphasizing the importance of designing models with inference requirements in mind. Discussions include unique techniques like pruning and quantization, the role of AutoNAC in optimizing neural architectures, and how DESI nets elevate performance in image classification tasks. Buckle up for a high-speed journey through AI advancements!

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Inference Workload vs. Training

Consider inference performance early in model development, not just after training.
Inference workload scales with production data, unlike training which scales with the number of data scientists.

ADVICE

Start Small, Think Big

Start with smaller models when considering production constraints.
Larger models achieve higher accuracy but are harder to deploy due to memory and latency limitations.

INSIGHT

GPU vs. CPU Inference

GPU inference is essential for high-resolution, real-time video analytics.
CPU inference is common for NLP tasks, but both GPU and CPU inference become expensive at scale.

Get the Snipd Podcast app to discover more snips from this episode

Get the app