Optimizing Machine Learning Models for Hardware Inference

This chapter explores methods for enhancing the efficiency of machine learning models during inference on specific hardware. It discusses the inference stack and open-source techniques such as pruning and quantization, along with the significance of model structure in relation to GPU and CPU performance.

Play episode from 13:24

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app