[ad_1]
Amazon SageMaker is a machine learning (ML) platform with a wide range of features to capture, transform, and measure data bias, and train, deploy, and manage models with best-in-class computing and services. such as Amazon SageMaker Data Wrangler, Amazon SageMaker Studio, Amazon SageMaker Canvas, Amazon SageMaker Model Registry, Amazon SageMaker Feature Store, Amazon SageMaker Pipelines, Amazon SageMaker Model Monitor, and Amazon SageMaker Clarify. Many organizations choose SageMaker as their ML platform because it provides a common set of tools for developers and data scientists. AWS Independent Software Vendor (ISV) partners have already created integrations for customers of their Software as a Service (SaaS) platforms to use SageMaker and its various features, including training, deployment, and model registry.
In this post, we’ll discuss the advantages of SaaS platforms for integration with SageMaker, the range of possible integrations, and the development process for those integrations. We’ll also take a deep dive into the most common architectures and AWS resources to facilitate this integration. This is intended to accelerate the market for ISV partners and other SaaS providers creating similar integrations and to inspire customers who are users of SaaS platforms to partner with SaaS providers in these integrations.
Benefits of integrating with SageMaker
There are a number of benefits for SaaS providers to integrate SaaS platforms with SageMaker:
- SaaS platform users can take advantage of the comprehensive ML platform in SageMaker
- Users can build ML models with data that resides inside or outside of the SaaS platform and use those ML models
- It provides users with a seamless experience between the SaaS platform and SageMaker
- Users can use the foundational models available in Amazon SageMaker JumpStart to build generative AI applications
- Organizations can standardize SageMaker
- SaaS providers can focus on their core functionality and offer SageMaker for ML model development
- It empowers SaaS providers to build collaborative solutions and go to market with AWS
SageMaker overview and integration options
SageMaker has tools for every stage of the ML lifecycle. SaaS platforms can integrate with SageMaker throughout the ML lifecycle, from data labeling and preparation to model training, hosting, monitoring, and management of models with various components, as shown in the following figure. Depending on the need, any and all parts of the ML lifecycle can be run in a customer’s AWS account or a SaaS AWS account, and data and models can be shared across accounts using AWS Identity and Access Management (IAM) policies or third parties. User-based access tools. This integration flexibility makes SageMaker an ideal platform for customers and SaaS providers to standardize.
Integration process and architectures
In this section, we break down the integration process into four main steps and cover common architectures. Note that there may be other integration points besides them, but this is less common.
- Data access – How to access data that is in the SaaS platform from SageMaker
- Model training – How does the model train?
- Model Layout and Artifacts – Where is the model located and what artifacts are produced
- Conclusion of the model – How to infer a SaaS platform
The diagrams in the following sections assume that SageMaker is running on a user’s AWS account. Most of the options explained also apply if SageMaker is running on a SaaS AWS account. In some cases, an ISV may deploy its software to a customer’s AWS account. This is usually in a dedicated customer AWS account, which means you still need access to the customer AWS account where SageMaker is running.
There are several different ways that authentication of AWS accounts can be achieved when SaaS platform data is accessed from SageMaker and when an ML model is called from the SaaS platform. The recommended method is to use IAM roles. An alternative is to use AWS Access Keys, which consist of an Access Key ID and a Secret Access Key.
Data access
There are many options for how data in a SaaS platform can be accessed from SageMaker. Data can be accessed from the SageMaker notebook, SageMaker Data Wrangler, where users can prepare data for ML or SageMaker Canvas. The most common data access options are:
- SageMaker Data Wrangler built-in connector – The SageMaker Data Wrangler connector allows data to be imported from the SaaS platform to be prepared for ML model training. The connector is developed jointly by AWS and the SaaS provider. Current SaaS platform connectors include Databricks and Snowflake.
- Amazon Athena Federated Requirement for SaaS Platform – Federated requests allow users to query the platform from a SageMaker notebook through Amazon Athena using a custom connector developed by the SaaS provider.
- Amazon AppFlow – With Amazon AppFlow, you can use a custom connector to pull data into Amazon Simple Storage Service (Amazon S3), which can then be accessed from SageMaker. A SaaS platform connector can be developed by AWS or a SaaS provider. The open source Custom Connector SDK allows creating a private, shared, or public connector using Python or Java.
- SaaS Platform SDK – If the SaaS platform has an SDK (Software Development Kit) such as the Python SDK, this can be used to access data directly from the SageMaker notebook.
- other options – In addition, there may be other options depending on whether the SaaS provider exposes their data via APIs, files, or an agent. The agent can be installed on Amazon Elastic Compute Cloud (Amazon EC2) or AWS Lambda. Alternatively, a service such as AWS Glue or a third-party extract, transform, and load (ETL) tool can be used to transfer the data.
The following diagram illustrates the architecture of data access options.
Model training
The model can be run in SageMaker Studio by a data scientist, using Amazon SageMaker Autopilot by a non-data scientist, or in SageMaker Canvas by a business analyst. SageMaker Autopilot takes the brunt of building ML models, including feature engineering, algorithm selection, and hyperparameter settings, and is relatively easy to integrate directly into a SaaS platform. SageMaker Canvas provides a visual interface for training ML models.
Additionally, data scientists can use pre-trained models in SageMaker JumpStart, including foundational models from sources such as Alexa, AI21 Labs, Hugging Face, and Stability AI, and customize them for their own generative AI use cases.
Alternatively, the model may be trained in a third-party or partner-provided tool, service, or infrastructure, including internal resources, provided the model artifacts are accessible and readable.
The following diagram illustrates these options.
Model Layout and Artifacts
After you have prepared and tested the model, you can either deploy it to a SageMaker model endpoint in a user account, or export it from SageMaker and import it into the SaaS platform repository. The model can be saved and imported in standard formats supported by common ML frameworks such as pickle, joblib and ONNX (Open Neural Network Exchange).
If the ML model is deployed to a SageMaker model endpoint, additional model metadata can be stored in the SageMaker Model Registry, SageMaker Model Cards, or a file in an S3 bucket. These can be model version, model inputs and outputs, model metrics, model creation date, inference specification, data line information, and more. Where no feature is available in the model package, the data can be stored as custom metadata or in an S3 file.
Creating such metadata can help SaaS providers manage the end-of-life of an ML model more efficiently. This information can be synchronized with the model log of the SaaS platform and used to track changes and updates to the ML model. Subsequently, this log can be used to determine whether to update the underlying data and applications that use this ML model on the SaaS platform.
The following diagram illustrates this architecture.
Conclusion of the model
SageMaker offers four ML model inference options: real-time inference, serverless inference, asynchronous inference, and batch transformation. For the first three, the model is deployed to the SageMaker model endpoint, and the SaaS platform calls the model using the AWS SDKs. The recommended option is to use the Python SDK. The inference pattern for each is similar in that the predict() or predict_async() methods are used. Access to accounts can be achieved using role-based access.
It is also possible to seal the backend with Amazon API Gateway, which calls the endpoint via a Lambda function running on a secure private network.
For batch transformation, data from the SaaS platform must first be exported in batch to an S3 bucket in the customer’s AWS account, then inference is made on that data in batches. Inference is done by first creating a transformer job or object, and then calling the transform() method with the S3 location of the data. The results are imported into the SaaS platform in batches as datasets and joined with other datasets in the platform as part of a batch pipeline job.
Another option to conclude is to do this directly in the compute cluster of the SaaS account. This will be the case when the model is imported into the SaaS platform. In this case, SaaS providers can choose from a range of EC2 instances that are optimized for ML inference.
The following diagram illustrates these options.
Examples of integration
Several ISVs have built integrations between SaaS platforms and SageMaker. For more information on some integration examples, see the following:
conclusion
In this post, we’ve explained why and how SaaS providers should integrate SageMaker with their SaaS platforms, breaking down the process into four parts and covering the overall integration architecture. SaaS providers looking to integrate with SageMaker can use these architectures. If there are any custom requirements beyond this post, including other SageMaker components, contact your AWS account teams. After creating and validating the integration, ISV partners can join the AWS Service Ready program for SageMaker and unlock various benefits.
We also ask customers who are users of SaaS platforms to register their interest in integrating with Amazon SageMaker with their AWS account teams, as this can help inspire and advance development for SaaS providers.
About the authors
Mehmed Bakaloglu is a principal solutions architect at AWS, focusing on data analytics, AI/ML, and ISV partners.
Raj Kadiala is the chief AI/ML evangelist at AWS.
[ad_2]
Source link