Developing a BYO container for inference
In this section, we will learn how to build a SageMaker-compatible inference container using an official TensorFlow image, prepare an inference script and model server, and deploy it for inference on SageMaker Hosting.
Problem overview
We will develop a SageMaker-compatible container for inference. We will use the latest official TensorFlow container as a base image and use AWS MMS as a model server. Please note that MMS is one of many ML model serving options that can be used. SageMaker doesn’t have any restrictions on a model server other than that it should serve models on port 8080
.
Developing the serving container
When deploying a serving container to the endpoint, SageMaker runs the following command:
docker run <YOUR BYO IMAGE> serve
To comply with this requirement, it’s recommended that you use the exec format of the ENTRYPOINT
instruction in your Dockerfile.
Let’s review our BYO Dockerfile...