-
Book Overview & Buying
-
Table Of Contents
Accelerate Deep Learning Workloads with Amazon SageMaker
By :
In Chapter 8, Considering Hardware for Inference, and Chapter 9, Implementing Model Servers, we discussed how to engineer your deep learning (DL) inference workloads on Amazon SageMaker. We also reviewed how to select appropriate hardware for inference workloads, optimize model performance, and tune model servers based on specific use case requirements. In this chapter, we will focus on how to operationalize your DL inference workloads once they have been deployed to test and production environments.
In this chapter, we will start by reviewing advanced model hosting options such as multi-model, multi-container, and Serverless Inference endpoints to optimize your resource utilization and workload costs. Then, we will cover the Application Auto Scaling service for SageMaker, which provides another mechanism to improve resource utilization. Auto Scaling allows you to dynamically match your inference traffic requirements with provisioned inference...