MLOps - Deploy Classifier with AWS Serverless
Continue Pneumonia Classifier by moving to AWS Serverless
Go AWS Serverless Deployment
Deploying machine learning models in production requires additional considerations to address latency, scalability, cost-efficiency, and monitoring.
A modern approach to hosting an ML application in AWS can be considered as a serverless architecture.
This allows users to upload images via a S3 static web page and send them to API Gateway. The API Gateway receives the HTTP POST request and forwards it to the Lambda function, which handles the image preprocessing, inference, and postprocessing logic. The Lambda function sends the image payload to the SageMaker endpoint, by calling the SageMaker endpoint using the SageMaker Runtime SDK (invoke_endpoint) which hosts my trained model, and then retrieves the prediction. The prediction result is sent back to the frontend for display.
This approach leverages S3, AWS Lambda, Amazon API Gateway, and Amazon SageMaker, to leverage with Amazon SageMaker Managed endpoints (Real-Time Inference) that handle auto-scaling, security, and monitoring out-of-the-box.
Advantage with AWS Serverless
- Scalability: API Gateway scales automatically to handle high concurrency. Lambda scales horizontally (serverless) and is invoked only when needed. SageMaker Endpoint supports auto-scaling to handle varying inference loads.
- Low Latency: Real-time inference is achieved with the SageMaker Endpoint.
- Cost Optimized: Lambda is a pay-per-use service, so we're not paying for idle compute resources. SageMaker Endpoint supports multi-model endpoints and elastic inference for cost savings.
- AWS Serverless and fully managed services: Fully managed services reduce operational overhead for model hosting and frontend and backend infrastructure.
# Model flow S3 (Model Artifacts) → SageMaker Model → Real-Time Endpoint (GPU) → Auto-Scaling + Model Monitor # Image and Inference flow S3 Static web → Upload Image → API Gateway → Lambda (Image Preprocessing) → SageMaker Endpoint (Real-Time Inference) → API Gateway → S3 (Result)
Design of the Pneumonia Classifier Application with AWS Serverless
Here are the components of the serverless application:
- Frontend: Contains the static files for the website hosted on S3, allowing users to upload images and display predictions.
- Backend: Yaml file to deploy API Gateway to trigger the Lambda function.
- Lambda: Defines function to process the image and interact with the SageMaker endpoint.
- SageMaker: Configuration for the serverless endpoint that hosts the ML model.
# Folder Structure
root@zackz:/mnt/f/ml-local/local-cv/aws-deploy# tree
.
├── backend
│ └── api-gateway-config.yaml
├── frontend
│ ├── assets
│ │ ├── fonts
│ │ └── images
│ ├── index.html
│ ├── script.js
│ └── style.css
└── sagemaker
└── endpoint-config.yaml
Frontend: S3 Static Website Hosting
Upload files into the S3 bucket and enable Static Website Hosting, Specify index.html as the index document, Update the Bucket Policy to Allow Public Access, verify the URL for access.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PublicReadGetObject",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::sagemaker-frontend--zz-imageclassification/*"
}
]
}
Backend: API Gateway and Lambda Function
This folder will contain a CloudFormation YAML file to create `API Gateway`, `Lambda function` and `Lambda Execution role`. Outputs the API Gateway URL.
API Gateway resource will create a REST API (ImageClassificationAPI) with a /predict resource. Defines a POST method that integrates with the Lambda function, then deploys to a prod stage that allows API Gateway to invoke the Lambda function.
Lambda defines the function (ImageClassificationLambda) that interacts with the SageMaker endpoint. Includes the Python code for handling the image upload and invoking the SageMaker endpoint.
# api-gateway-config.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: CloudFormation template for API Gateway and Lambda integration
Resources:
# API Gateway
ImageClassificationAPI:
Type: AWS::ApiGateway::RestApi
Properties:
Name: ImageClassificationAPI
Description: API for image classification
# API Gateway Resource
PredictResource:
Type: AWS::ApiGateway::Resource
Properties:
RestApiId: !Ref ImageClassificationAPI
ParentId: !GetAtt ImageClassificationAPI.RootResourceId
PathPart: predict
# API Gateway Method (POST)
PredictMethod:
Type: AWS::ApiGateway::Method
Properties:
RestApiId: !Ref ImageClassificationAPI
ResourceId: !Ref PredictResource
HttpMethod: POST
AuthorizationType: NONE
Integration:
Type: AWS_PROXY
IntegrationHttpMethod: POST
Uri: !Sub arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${ImageClassificationLambda.Arn}/invocations
# Lambda Function
ImageClassificationLambda:
Type: AWS::Lambda::Function
Properties:
Handler: app.lambda_handler
Runtime: python3.9
Role: !GetAtt LambdaExecutionRole.Arn
Code:
ZipFile: |
import boto3
import json
import base64
sagemaker = boto3.client('sagemaker-runtime')
def lambda_handler(event, context):
try:
# Decode the image from the request
body = json.loads(event['body'])
image_bytes = base64.b64decode(body['file'])
# Call SageMaker endpoint
response = sagemaker.invoke_endpoint(
EndpointName='zack-aws-sagemaker-endpoint',
ContentType='application/x-image',
Body=image_bytes
)
# Parse the prediction
prediction = json.loads(response['Body'].read().decode())
return {
'statusCode': 200,
'body': json.dumps({'prediction': prediction})
}
except Exception as e:
return {
'statusCode': 500,
'body': json.dumps({'error': str(e)})
}
# Lambda Execution Role
LambdaExecutionRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
Policies:
- PolicyName: LambdaSageMakerAccess
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- sagemaker:InvokeEndpoint
Resource: "*"
- Effect: Allow
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Resource: "*"
# API Gateway Deployment
ApiGatewayDeployment:
Type: AWS::ApiGateway::Deployment
Properties:
RestApiId: !Ref ImageClassificationAPI
StageName: prod
# API Gateway Permission to Invoke Lambda
ApiGatewayPermission:
Type: AWS::Lambda::Permission
Properties:
Action: lambda:InvokeFunction
FunctionName: !GetAtt ImageClassificationLambda.Arn
Principal: apigateway.amazonaws.com
SourceArn: !Sub arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${ImageClassificationAPI}/*/POST/predict
Outputs:
ApiGatewayUrl:
Description: URL of the API Gateway
Value: !Sub https://${ImageClassificationAPI}.execute-api.${AWS::Region}.amazonaws.com/prod
# deploy backend aws cloudformation create-stack \ --stack-name image-classification-api \ --template-body file://api-gateway-config.yaml \ --capabilities CAPABILITY_NAMED_IAM \ --region ap-southeast-2
# test the API curl -X POST -F "file=@data/chest_xray/val/val_normal0.jpeg" https://i4s4znf7bb.execute-api.ap-southeast-2.amazonaws.com/prod/ImageClassificationAPI/predict Output: b'[0.8592441082997322, 0.14075589179992676]'
Sagemaker Endpoint
The CloudFormation template covers the SageMaker Execution Role, which grants the SageMaker service permissions to access S3 (for model artifacts) and CloudWatch (for logging), also includes the SageMaker Model, serverless Endpoint with a maximum concurrency of 5 and 2048 MB of memory, and outputs the SageMaker endpoint name.
AWSTemplateFormatVersion: '2010-09-09'
Description: CloudFormation template for SageMaker serverless endpoint
Resources:
# SageMaker Execution Role
SageMakerExecutionRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: sagemaker.amazonaws.com
Action: sts:AssumeRole
Policies:
- PolicyName: SageMakerAccess
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- s3:GetObject
- s3:PutObject
Resource: arn:aws:s3:::sagemaker-bucket-851725491342/*
- Effect: Allow
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Resource: "*"
# SageMaker Model
ImageClassificationModel:
Type: AWS::SageMaker::Model
Properties:
ModelName: zack-aws-sagemaker-endpoint
PrimaryContainer:
Image: algorithm_image
ModelDataUrl: s3://sagemaker-bucket-851725491342/models/image_model/classifier-2025-01-26-02-58-03-001-a577816e/output/model.tar.gz
ExecutionRoleArn: !GetAtt SageMakerExecutionRole.Arn
# SageMaker Endpoint Configuration
ImageClassificationEndpointConfig:
Type: AWS::SageMaker::EndpointConfig
Properties:
ProductionVariants:
- ModelName: !Ref ImageClassificationModel
VariantName: AllTraffic
ServerlessConfig:
MaxConcurrency: 5
MemorySizeInMB: 2048
# SageMaker Endpoint
ImageClassificationEndpoint:
Type: AWS::SageMaker::Endpoint
Properties:
EndpointConfigName: !Ref ImageClassificationEndpointConfig
EndpointName: ImageClassificationEndpoint
Outputs:
SageMakerEndpointName:
Description: Name of the SageMaker endpoint
Value: !Ref ImageClassificationEndpoint
aws cloudformation create-stack \ --stack-name sagemaker-endpoint \ --template-body file://endpoint-config.yaml \ --capabilities CAPABILITY_NAMED_IAM \ --region ap-southeast-2
Test and Validate
Access the S3 static website URL, choose an image to verify the model prediction. Logs can be found via both CloudWatch SageMaker endpoint and Lambda logs.
API and Model prediction verified from Postman
Key Takeaways:
- Move Pneumonia image classification ML application from local Docker to AWS serverless deployment.
- Leverage AWS API Gateway and Lambda to preprocess images and call the ML model.
- Deployed the Pneumonia Classifier model with a SageMaker serverless endpoint for real-time inference.
- Automated AWS resource provisioning with CloudFormation.
- Tested end-to-end functionality, logging, and monitored with CloudWatch.