In the previous chapter, you learned how to automate training and deployment workflows.
In this final chapter, we'll focus on optimizing cost and performance for prediction infrastructure, which typically accounts for 90% of the machine learning spend by AWS customers. This number may come as a surprise, until we realize that the model built by a single training job may end on multiple endpoints running 24/7 at large scale.
Hence, great care must be taken to optimize your prediction infrastructure so as to ensure that you get the most bang for your buck!
This chapter has the following topics:
- Autoscaling an endpoint
- Deploying a multi-model endpoint
- Deploying a model with Amazon Elastic Inference
- Compiling models with Amazon SageMaker Neo