

#551: Deep Dive Into SageMaker Serverless Inference
Oct 17, 2022
Rishabh Ray Chaudhury, a Senior Product Manager at AWS with expertise in SageMaker, delves into the innovative SageMaker Serverless Inference. He discusses how this feature alleviates infrastructure headaches, allowing users to focus on their machine learning models. Rishabh highlights customer use cases, emphasizing cost savings and ease of deployment. The conversation also covers efficient model updates and real-world success stories, showcasing the significant benefits of using serverless architecture for machine learning applications.
AI Snips
Chapters
Transcript
Episode notes
Inference Costs
- Inference, the process of using trained ML models in production, is a recurring cost.
- It constitutes a significant portion, often around 80%, of total ML infrastructure costs.
Serverless Inference
- Consider SageMaker serverless inference for unpredictable ML workloads to avoid over-provisioning.
- This serverless option reduces costs by scaling resources based on traffic and eliminating idle time charges.
Deployment Process
- Deploying models with SageMaker serverless inference involves specifying the ECR location for inference code and S3 location for model artifacts.
- Users define memory configuration and max concurrency within the endpoint configuration, simplifying the deployment process.