#551: Deep Dive Into SageMaker Serverless Inference

Oct 17, 2022

Rishabh Ray Chaudhury, a Senior Product Manager at AWS with expertise in SageMaker, delves into the innovative SageMaker Serverless Inference. He discusses how this feature alleviates infrastructure headaches, allowing users to focus on their machine learning models. Rishabh highlights customer use cases, emphasizing cost savings and ease of deployment. The conversation also covers efficient model updates and real-world success stories, showcasing the significant benefits of using serverless architecture for machine learning applications.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Inference Costs

Inference, the process of using trained ML models in production, is a recurring cost.
It constitutes a significant portion, often around 80%, of total ML infrastructure costs.

ADVICE

Serverless Inference

Consider SageMaker serverless inference for unpredictable ML workloads to avoid over-provisioning.
This serverless option reduces costs by scaling resources based on traffic and eliminating idle time charges.

INSIGHT

Deployment Process

Deploying models with SageMaker serverless inference involves specifying the ECR location for inference code and S3 location for model artifacts.
Users define memory configuration and max concurrency within the endpoint configuration, simplifying the deployment process.

Get the Snipd Podcast app to discover more snips from this episode

Get the app