You need to create your own container if you are not using Sagemaker pre-built algorithms and need additional libraries / new frameworks for your inference to work.
First you need to have your inference code ready in a file called predictor.py
. This is the file that implements the code for loading the model, making inference and packaging it as a flask application with the hosting APIs described below.
# This is the file that implements a flask server to do inferences. It's the file that you will modify to
# implement the scoring for your own algorithm.
from __future__ import print_function
import os
import json
import pickle
import StringIO
import sys
import signal
import traceback
import flask
import pandas as pd
prefix = '/opt/ml/'
model_path = os.path.join(prefix, 'model')
# A singleton for holding the model. This simply loads the model and holds it.
# It has a predict function that does a prediction based on the model and the input data.
class ScoringService(object):
model = None # Where we keep the model when it's loaded
@classmethod
def get_model(cls):
"""Get the model object for this instance, loading it if it's not already loaded."""
if cls.model == None:
#This is an example where model us a pickle file takes a dataframe and return a dataframe.
with open(os.path.join(model_path, 'model.pkl'), 'r') as inp:
cls.model = pickle.load(inp)
return cls.model
@classmethod
def predict(cls, input):
"""For the input, do the predictions and return them. Implement your own prediction function here
Args:
input (a pandas dataframe): The data on which to do the predictions. There will be
one prediction per row in the dataframe"""
clf = cls.get_model()
return clf.predict(input)
# The flask app for serving predictions
app = flask.Flask(__name__)
@app.route('/ping', methods=['GET'])
def ping():
"""Determine if the container is working and healthy. In this sample container, we declare
it healthy if we can load the model successfully."""
health = ScoringService.get_model() is not None # You can insert a health check here
status = 200 if health else 404
return flask.Response(response='\n', status=status, mimetype='application/json')
@app.route('/invocations', methods=['POST'])
def transformation():
"""Do an inference on a single batch of data. In this sample server, we take data as CSV, convert
it to a pandas data frame for internal use and then convert the predictions back to CSV (which really
just means one prediction per line, since there's a single column.
"""
data = None
# Convert from CSV to pandas
if flask.request.content_type == 'text/csv':
data = flask.request.data.decode('utf-8')
s = StringIO.StringIO(data)
data = pd.read_csv(s, header=None)
else:
return flask.Response(response='This predictor only supports CSV data', status=415, mimetype='text/plain')
print('Invoked with {} records'.format(data.shape[0]))
# Do the prediction
predictions = ScoringService.predict(data)
# Convert from numpy back to CSV
out = StringIO.StringIO()
pd.DataFrame({'results':predictions}).to_csv(out, header=False, index=False)
result = out.getvalue()
return flask.Response(response=result, status=200, mimetype='text/csv')
Hosting involves reponding to inference requests that come in via HTTP. In this example, we used our recommended Python serving stack to provide robust and scalable serving of inference requests:
This stack is implemented in the sample code here and you can mostly just leave it alone.
Amazon SageMaker uses two URLs in the container:
/ping
will receiveGET
requests from the infrastructure. Your program returns 200 if the container is up and accepting requests./invocations
is the endpoint that receives client inferencePOST
requests. The format of the request and the response is up to the algorithm. If the client suppliedContentType
andAccept
headers, these will be passed in as well.
The model files will be copied inside the container in the following place when endpoint is created:
/opt/ml
`-- model
`-- <model files>
Here is the docker file in the container
directory for creating a hosting image. In the container
directory are all the components you need to package the sample algorithm for Amazon SageMager:
.
|-- Dockerfile
|-- build_and_push.sh
`-- <inference_specific_dir>
|-- nginx.conf
|-- predictor.py
|-- serve
|-- train
`-- wsgi.py
Let's discuss each of these in turn:
Dockerfile
describes how to build your Docker container image. More details below.
# Build an image that can do training and inference in SageMaker
# This is a Python 2 image that uses the nginx, gunicorn, flask stack
# for serving inferences in a stable way.
FROM ubuntu:16.04
MAINTAINER Amazon AI <[email protected]>
RUN apt-get -y update && apt-get install -y --no-install-recommends \
wget \
python \
nginx \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
# Here we get all python packages.
# There's substantial overlap between scipy and numpy that we eliminate by
# linking them together. Likewise, pip leaves the install caches populated which uses
# a significant amount of space. These optimizations save a fair amount of space in the
# image, which reduces start up time.
RUN wget https://bootstrap.pypa.io/get-pip.py && python get-pip.py && \
pip install numpy==1.16.2 scipy==1.2.1 scikit-learn==0.20.2 pandas flask gevent gunicorn && \
(cd /usr/local/lib/python2.7/dist-packages/scipy/.libs; rm *; ln ../../numpy/.libs/* .) && \
rm -rf /root/.cache
# Set some environment variables. PYTHONUNBUFFERED keeps Python from buffering our standard
# output stream, which means that logs can be delivered to the user quickly. PYTHONDONTWRITEBYTECODE
# keeps Python from writing the .pyc files which are unnecessary in this case. We also update
# PATH so that the train and serve programs are found when the container is invoked.
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"
# Set up the program in the image
# Replace name for <inference_specific_dir>
COPY <inference_specific_dir> /opt/program
WORKDIR /opt/program
build_and_push.sh
is a script that uses the Dockerfile to build your container images and then pushes it to ECR. You can refer to this image to create your model.
#!/usr/bin/env bash
# This script shows how to build the Docker image and push it to ECR to be ready for use
# by SageMaker.
# The argument to this script is the image name. This will be used as the image on the local
# machine and combined with the account and region to form the repository name for ECR.
image=$1
if [ "$image" == "" ]
then
echo "Usage: $0 <image-name>"
exit 1
fi
# Replace name for <inference_specific_dir>
chmod +x <inference_specific_dir>/train
chmod +x <inference_specific_dir>/serve
# Get the account number associated with the current IAM credentials
account=$(aws sts get-caller-identity --query Account --output text)
if [ $? -ne 0 ]
then
exit 255
fi
# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}
fullname="${account}.dkr.ecr.${region}.amazonaws.com/${image}:latest"
# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${image}" > /dev/null 2>&1
if [ $? -ne 0 ]
then
aws ecr create-repository --repository-name "${image}" > /dev/null
fi
# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)
# Build the docker image locally with the image name and then push it to ECR
# with the full name.
docker build -t ${image} .
docker tag ${image} ${fullname}
docker push ${fullname}
<inference_specific_dir>
is the directory which contains the files that will be installed in the container.local_test
is a directory that shows how to test your new container on any computer that can run Docker, including an Amazon SageMaker notebook instance. Using this method, you can quickly iterate using small datasets to eliminate any structural bugs before you use the container with Amazon SageMaker. Optional dir for which you can find example inbyoc_cc
folder.
In this simple application, we only install five files in the container. You may only need that many or, if you have many supporting routines, you may wish to install more. These five show the standard structure of our Python containers, although you are free to choose a different toolset and therefore could have a different layout. If you're writing in a different programming language, you'll certainly have a different layout depending on the frameworks and tools you choose.
The files in the <inference_specific_dir>
that we'll put in the container are:
nginx.conf
is the configuration file for the nginx front-end. Generally, you should be able to take this file as-is.predictor.py
is the program that actually implements the Flask web server and the inference predictions for this app. You'll want to customize the actual prediction parts to your application. You may choose to have separate files for implementing your custom logic.serve
is the program started when the container is started for hosting. It simply launches the gunicorn server which runs multiple instances of the Flask app defined inpredictor.py
. You should be able to take this file as-is.train
is the program that is invoked when the container is run for training. You will modify this program to implement your training algorithm.wsgi.py
is a small wrapper used to invoke the Flask app. You should be able to take this file as-is.
In summary, the two files you will probably want to change for your application are train
and predictor.py
.
An endpoint can be created either through the console or by using Sagemaker Python APIs (boto3) to create model, create endpoint configuration and create endpoint by giving the path to the model in the S3 bucket and the docker image in ECR.