Book Image

Azure Data Scientist Associate Certification Guide

By : Andreas Botsikas, Michael Hlobil

Book Image

Azure Data Scientist Associate Certification Guide

By: Andreas Botsikas, Michael Hlobil

Overview of this book

The Azure Data Scientist Associate Certification Guide helps you acquire practical knowledge for machine learning experimentation on Azure. It covers everything you need to pass the DP-100 exam and become a certified Azure Data Scientist Associate. Starting with an introduction to data science, you'll learn the terminology that will be used throughout the book and then move on to the Azure Machine Learning (Azure ML) workspace. You'll discover the studio interface and manage various components, such as data stores and compute clusters. Next, the book focuses on no-code and low-code experimentation, and shows you how to use the Automated ML wizard to locate and deploy optimal models for your dataset. You'll also learn how to run end-to-end data science experiments using the designer provided in Azure ML Studio. You'll then explore the Azure ML Software Development Kit (SDK) for Python and advance to creating experiments and publishing models using code. The book also guides you in optimizing your model's hyperparameters using Hyperdrive before demonstrating how to use responsible AI tools to interpret and debug your models. Once you have a trained model, you'll learn to operationalize it for batch or real-time inferences and monitor it in production. By the end of this Azure certification study guide, you'll have gained the knowledge and the practical skills required to pass the DP-100 exam.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Share Your Thoughts

Section 1: Starting your cloud-based data science journey

Section 1: Starting your cloud-based data science journey

Free Chapter

Chapter 1: An Overview of Modern Data Science

Chapter 1: An Overview of Modern Data Science

The evolution of data science

Working on a data science project

Using Spark in data science

Adopting the DevOps mindset

Further reading

Chapter 2: Deploying Azure Machine Learning Workspace Resources

Chapter 2: Deploying Azure Machine Learning Workspace Resources

Technical requirements

Deploying Azure ML through the portal

Deploying Azure ML via the CLI

Alternative ways to deploy an Azure ML workspace

Exploring the deployed Azure resources

Further reading

Chapter 3: Azure Machine Learning Studio Components

Chapter 3: Azure Machine Learning Studio Components

Technical requirements

Interacting with the Azure ML resource

Exploring the Azure ML Studio experience

Authoring experiments within Azure ML Studio

Tracking data science assets in Azure ML Studio

Managing infrastructure resources in Azure ML Studio

Chapter 4: Configuring the Workspace

Chapter 4: Configuring the Workspace

Technical requirements

Provisioning compute resources

Connecting to datastores

Working with datasets

Further reading

Section 2: No code data science experimentation

Section 2: No code data science experimentation

Chapter 5: Letting the Machines Do the Model Training

Chapter 5: Letting the Machines Do the Model Training

Technical requirements

Configuring an AutoML experiment

Monitoring the execution of the experiment

Deploying the best model as a web service

Further reading

Chapter 6: Visual Model Training and Publishing

Chapter 6: Visual Model Training and Publishing

Technical requirements

Overview of the designer

Building the pipeline with the designer

Creating a batch and real-time inference pipeline

Deploying a real-time inference pipeline

Further reading

Section 3: Advanced data science tooling and capabilities

Section 3: Advanced data science tooling and capabilities

Chapter 7: The AzureML Python SDK

Chapter 7: The AzureML Python SDK

Technical requirements

Overview of the Python SDK

Working in AzureML notebooks

Basic coding with the AzureML SDK

Working with the AzureML CLI extension

Further reading

Chapter 8: Experimenting with Python Code

Chapter 8: Experimenting with Python Code

Technical requirements

Training a simple sklearn model within notebooks

Tracking metrics in Experiments

Scaling the training process with compute clusters

Further reading

Chapter 9: Optimizing the ML Model

Chapter 9: Optimizing the ML Model

Technical requirements

Hyperparameter tuning using HyperDrive

Running AutoML experiments with code

Further reading

Chapter 10: Understanding Model Results

Chapter 10: Understanding Model Results

Technical requirements

Creating responsible machine learning models

Interpreting the predictions of the model

Analyzing model errors

Detecting potential model fairness issues

Further reading

Chapter 11: Working with Pipelines

Chapter 11: Working with Pipelines

Technical requirements

Understanding AzureML pipelines

Authoring a pipeline

Publishing a pipeline to expose it as an endpoint

Scheduling a recurring pipeline

Further reading

Chapter 12: Operationalizing Models with Code

Chapter 12: Operationalizing Models with Code

Technical requirements

Understanding the various deployment options

Registering models in the workspace

Deploying real-time endpoints

Creating a batch inference pipeline

Further reading

Other Books You May Enjoy

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Publishing a pipeline to expose it as an endpoint

So far, you have defined a pipeline using the AzureML SDK. If you had to restart the kernel of your Jupyter notebook, you would lose the reference to the pipeline you defined, and you would have to rerun all the cells to recreate the pipeline object. The AzureML SDK allows you to publish a pipeline that effectively registers it as a versioned object within the workspace. Once a pipeline is published, it can be submitted without the Python code that constructed it.

In a new cell in your notebook, add the following code:

published_pipeline = pipeline.publish(
    "Loans training pipeline", 
    description="A pipeline to train a LightGBM model")

This code publishes the pipeline and returns a PublishedPipeline object, the versioned object registered within the workspace. The most interesting attribute of that object is the endpoint, which returns the REST endpoint URL...