Book Image

Azure Machine Learning Engineering

By : Sina Fakhraee, Balamurugan Balakreshnan, Megan Masanz
Book Image

Azure Machine Learning Engineering

By: Sina Fakhraee, Balamurugan Balakreshnan, Megan Masanz

Overview of this book

Data scientists working on productionizing machine learning (ML) workloads face a breadth of challenges at every step owing to the countless factors involved in getting ML models deployed and running. This book offers solutions to common issues, detailed explanations of essential concepts, and step-by-step instructions to productionize ML workloads using the Azure Machine Learning service. You’ll see how data scientists and ML engineers working with Microsoft Azure can train and deploy ML models at scale by putting their knowledge to work with this practical guide. Throughout the book, you’ll learn how to train, register, and productionize ML models by making use of the power of the Azure Machine Learning service. You’ll get to grips with scoring models in real time and batch, explaining models to earn business trust, mitigating model bias, and developing solutions using an MLOps framework. By the end of this Azure Machine Learning book, you’ll be ready to build and deploy end-to-end ML solutions into a production system using the Azure Machine Learning service for real-time scenarios.
Table of Contents (17 chapters)
1
Part 1: Training and Tuning Models with the Azure Machine Learning Service
7
Part 2: Deploying and Explaining Models in AMLS
12
Part 3: Productionizing Your Workload with MLOps

Creating a data asset using the Python SDK

In this section, we will show you how to create a data asset using the Python SDK. As mentioned in the previous section, you can create data from datastores, local files, and public URLs. The Python script to create a data asset from a local file (for example, titanic.csv) is shown in Figure 2.19.

Please note that in the following code snippet, type = AssetTypes.mltable abstracts the schema definition for the tabular data, making it easier to share datasets:

Figure 2.19 – Creating a data asset via the Python SDK

Figure 2.19 – Creating a data asset via the Python SDK

Inside the my_data folder, there are two files:

  • The actual data file, which in this case is titanic.csv
  • The mltable file, which is a YAML file specifying the data’s schema so that the mltable engine can use it in order to materialize the data into an in-memory object such as pandas or DASK

Figure 2.20 shows the mltable YAML file for this example:

Figure 2.20 – The mltable YAML file for creating an mltable data asset...