Understanding the types of ML inference in production
In the previous section, we saw the priorities of ML in research and production. To serve the business needs in production, ML models are inferred using various deployment targets, depending on the need. Predicting or making a decision using an ML model is called ML model inference. Let's explore ways of deploying ML models on different deployment targets to facilitate ML inference as per the business needs.
Deployment targets
In this section, we will look at different types of deployment targets and why and how we serve ML models for inference in these deployment targets. Let's start by looking at a virtual machine or an on-premises server.
Virtual machines
Virtual machines can be on the cloud or on-premises, depending on the IT setup of a business or an organization. Serving ML models on virtual machines is quite common. ML models are served on virtual machines in the form of web services. The web service...