SageMaker Serverless Inference illustrates Amazon’s philosophy for ML workloads

We’re excited to bring Transform 2022 back in person on July 19 and pretty much July 20-28. Join AI and data leaders for insightful conversations and exciting networking opportunities. Register today!

Amazon just unveiled Serverless Inference, a new option for SageMaker’s fully managed machine learning (ML) service. The goal of Amazon SageMaker Serverless Inference is to serve use cases with intermittent or erratic traffic patterns, reduce total cost of ownership (TCO), and make the service easier to use.

VentureBeat reached out to Bratin Saha, AWS VP Machine Learning, to discuss where Amazon SageMaker Serverless fits into the big picture of Amazon’s machine learning offering and how it affects ease of use and TCO, as well as Amazon’s philosophy and process at developing its machine learning portfolio.

Amazon SageMaker is in an ever-expanding trajectory

Inference is the productive phase of ML-powered applications. After a machine learning model is created and refined using historical data, it is deployed for use in production. Inference refers to taking new data as input and producing results based on that data. For production ML applications, Amazon notes, inference accounts for up to 90% of the total computation cost.

According to Saha, Serverless Inference is a frequently requested feature. SageMaker Serverless Inference was introduced in preview in December 2021 and is generally available today.

Serverless Inference enables SageMaker users to deploy machine learning models for inference without having to configure or manage the underlying infrastructure. The service is able to automatically provision and scale compute capacity based on the number of inference requests. During inactivity, the compute power is completely turned off so that users are not charged.

This is the latest addition to SageMaker’s options for serving inference. SageMaker Real-Time Inference is intended for workloads with low latency requirements on the order of milliseconds. SageMaker Asynchronous Inference is intended for inferences that require large payloads or that require long processing times. SageMaker Batch Transform to make predictions on batches of data, and SageMaker Serverless Inference is for workloads with intermittent or erratic traffic patterns.

SageMaker Serverless Inference comes on the heels of the SageMaker Inference Recommender service, introduced during a slew of AI and machine learning announcements at AWS re:Invent 2021. Inference Recommender helps users with the daunting task of making the most of the 70-plus available. Compute instance options and configuration management to deploy machine learning models for optimal inference performance and cost.

Overall, as Saha said, lowering TCO is a top priority for Amazon. Amazon has even published a comprehensive analysis of SageMaker’s TCO. According to that analysis, Amazon SageMaker is the most cost-effective choice for end-to-end machine learning support and scalability, with a 54% lower TCO than other options over 3 years.

What matters here, however, is what those “other options” are. In its analysis, Amazon compares SageMaker to other self-managed cloud-based machine learning options on AWS, such as Amazon Elastic Compute Cloud EC2 and Amazon Elastic Kubernetes Service EKS. According to Amazon’s analysis, SageMaker results in a lower TCO when factoring in the cost of developing the equivalent of the services it offers from the ground up.

That may be the case, but arguably users find a comparison to services offered by competitors such as Azure Machine Learning and Google Vertex AI more useful. As Saha shared, Amazon’s TCO analysis reflects its philosophy of focusing on its users rather than the competition.

Another important part of Amazon’s philosophy according to Saha is to strive to build an end-to-end offering and prioritize user needs. Product development has a customer-centric focus: customers are consulted regularly and it is their input that drives the prioritization and development of new features.

SageMaker appears to be on an ever-expanding trajectory, expanding its reach in terms of target audience as well. With the recent introduction of SageMaker Canvas for codeless AI model development, Amazon aims to enable business users and analysts to create ML-powered applications as well.

SageMaker Serverless Inference and Amazon’s Double Bottom Line with SageMaker

But what about Amazon’s double bottom line with SageMaker – better ease of use and lower TCO?

As Tianhui Michael Li and Hugo Bowne-Anderson note in their analysis of SageMaker’s new features on VentureBeat, user-centric design will be key to winning the cloud race, and while Sagemaker has made significant strides in that direction, it still has a way to go. In that light, Amazon’s strategy to move more EC2 and EKS users to SageMaker and expand its reach to include business users and analysts makes sense.

According to a 2020 Kaggle survey, SageMaker usage among data scientists is 16.5%, although overall AWS usage is 48.2% (mostly from direct access to EC2). At the moment it seems that only Coogle Cloud offers something similar to Serverless Inference, through Vertex Pipelines.

At first glance, SageMaker seems more versatile in terms of supported frameworks, and more modular compared to Google Vertex AI – something Saha also brought up as an area of ​​focus. Vertex Pipelines appears to be similar to SageMaker Model Building Pipelines, but is end-to-end serverless.

As Li and Bowne-Anderson note, Google’s cloud service ranks third overall (after Microsoft Azure and AWS), but a strong second for data scientists according to the Kaggle Survey.

The introduction of Serverless Inference capitalizes on ease of use as not having to configure instances is a big win. Saha told VentureBeat that switching between different inference options is possible, and this is usually done through configuration.

As Saha noted, Serverless Inference can be used to implement any machine learning model, whether trained on SageMaker or not. SageMaker’s built-in algorithms and machine learning framework serving containers can be used to deploy models to a serverless inference endpoint, but users can also choose to bring their own containers.

When traffic becomes predictable and stable, users can update from a serverless inference endpoint to a real-time SageMaker endpoint without making changes to their container image. By using Serverless Inference, users also benefit from SageMaker’s features, including built-in statistics such as number of calls, errors, latency, host statistics, and errors in Amazon CloudWatch.

Since the preview launch, SageMaker Serverless Inference has added support for the SageMaker Python SDK and the model registry. SageMaker Python SDK is an open source library for building and deploying ML models on SageMaker. The SageMaker Model Registry allows users to catalog, build, and deploy models for production.

Ease of use and TCO

Ease of use may be hard to quantify, but what about TCO? Serverless Inference should definitely lower the TCO for the use cases where it makes sense. However, Amazon has no specific stats to release at this time. What it does have though are testimonials from early users.

Jeff Boudier, product director at Hugging Face, reports that he has tested Amazon SageMaker Serverless Inference and has been able to significantly reduce costs for intermittent traffic loads while abstracting the infrastructure.

Lou Kratz, chief research engineer at Bazaarvoice, says Amazon SageMaker Serverless Inference offers the best of both worlds, as it scales quickly and seamlessly during content bursts and lowers costs for infrequently used models.

SageMaker Serverless Inference has increased the maximum number of concurrent calls per endpoint to 200 for the GA launch from 50 during preview, allowing Amazon SageMaker Serverless Inference to be used for high-traffic workloads. The service is now available in all AWS regions where Amazon SageMaker is available, with the exception of the AWS GovCloud (US) and AWS China.

VentureBeat’s mission is to be a digital city square for tech decision makers to learn about transformative business technology and transactions. Learn more about membership.

This post SageMaker Serverless Inference illustrates Amazon’s philosophy for ML workloads

was original published at “”