Using PTS
PTS is a native model server for PyTorch models. PTS was developed in collaboration between Meta and AWS to provide a production-ready model server for the PyTorch ecosystem. It allows you to serve and manage multiple models and serve requests via REST or gRPC endpoints. PTS supports serving TorchScripted models for better inference performance. It also comes with utilities to collect logs and metrics and optimization tweaks. SageMaker supports PTS as part of PyTorch inference containers (https://github.com/aws/deep-learning-containers/tree/master/pytorch/inference/docker).
Integration with SageMaker
PTS is a default model server for PyTorch models on Amazon SageMaker. Similar to TFS, SageMaker doesn’t expose native PTS APIs to end users for model management and inference. The following diagram shows how to integrate SageMaker and PTS:
Figure 9.2 – PTS architecture on SageMaker
Let’s highlight these integration details...