Using NVIDIA Triton
NVIDIA Triton is an open source model server developed by NVIDIA. It supports multiple DL frameworks (such as TensorFlow, PyTorch, ONNX, Python, and OpenVINO), as well various hardware platforms and runtime environments (NVIDIA GPUs, x86 and ARM CPUs, and AWS Inferentia). Triton can be used for inference in cloud and data center environments and edge or mobile devices. Triton is optimized for performance and scalability on various CPU and GPU platforms. NVIDIA provides a specialized utility for performance analysis and model analysis to improve Triton’s performance.
Integration with SageMaker
You can use Triton model servers by utilizing a pre-built SageMaker DL container with it. Note that SageMaker Triton containers are not open source. You can find the latest list of Triton containers here: https://github.com/aws/deep-learning-containers/blob/master/available_images.md#nvidia-triton-inference-containers-sm-support-only.
SageMaker doesn’...