Selecting your workload configuration
In the previous three chapters, we reviewed the different capabilities Amazon SageMaker provides to engineer and operate inference workloads: from selecting optimal compute instances and runtime environments to configuring model servers and managing and monitoring deployed models.
In this section, we will summarize various selection criteria that you can use when selecting inference workload configurations. Then, we will suggest a simple algorithm that will guide the decision-making process when you’re choosing your inference configuration.
When engineering your inference workload, you may consider the following selection criteria: