Choosing the instance type, load testing, and performance tuning for models
Traditionally, based on the model type (ML model or DL model) and model size, you can make a heuristic guess to test the model’s performance on a few instances. This approach is fast but might not be the best approach. Therefore, in order to optimize this process, alternatively, you can use the Inference Recommender feature of Amazon SageMaker (https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender.html), which automates the load testing and model tuning process across the SageMaker ML instances. It helps you to deploy the ML models on the optimized hardware, based on your performance requirements, at the lowest possible cost.
Let’s take an example by using a pretrained image classification model to understand how Inference Recommender works. The following steps outline the process of using Inference Recommender: