Book Image

Azure Data Scientist Associate Certification Guide

By : Andreas Botsikas, Michael Hlobil
Book Image

Azure Data Scientist Associate Certification Guide

By: Andreas Botsikas, Michael Hlobil

Overview of this book

The Azure Data Scientist Associate Certification Guide helps you acquire practical knowledge for machine learning experimentation on Azure. It covers everything you need to pass the DP-100 exam and become a certified Azure Data Scientist Associate. Starting with an introduction to data science, you'll learn the terminology that will be used throughout the book and then move on to the Azure Machine Learning (Azure ML) workspace. You'll discover the studio interface and manage various components, such as data stores and compute clusters. Next, the book focuses on no-code and low-code experimentation, and shows you how to use the Automated ML wizard to locate and deploy optimal models for your dataset. You'll also learn how to run end-to-end data science experiments using the designer provided in Azure ML Studio. You'll then explore the Azure ML Software Development Kit (SDK) for Python and advance to creating experiments and publishing models using code. The book also guides you in optimizing your model's hyperparameters using Hyperdrive before demonstrating how to use responsible AI tools to interpret and debug your models. Once you have a trained model, you'll learn to operationalize it for batch or real-time inferences and monitor it in production. By the end of this Azure certification study guide, you'll have gained the knowledge and the practical skills required to pass the DP-100 exam.
Table of Contents (17 chapters)
1
Section 1: Starting your cloud-based data science journey
6
Section 2: No code data science experimentation
9
Section 3: Advanced data science tooling and capabilities

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "You can also change the autogenerated name of the pipeline you are designing. Rename the current pipeline to test-pipeline."

A block of code is set as follows:

from azureml.train.hyperdrive import GridParameterSampling
from azureml.train.hyperdrive import choice
param_sampling = GridParameterSampling( {
        "a": choice(0.01, 0.5),
        "b": choice(10, 100)
    }
)

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

from azureml.core import Workspace
ws = Workspace.from_config()
loans_ds = ws.datasets['loans']
compute_target = ws.compute_targets['cpu-sm-cluster']

Any command-line input or output is written as follows:

az group create --name my-name-rg --location westeurope

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "Navigate to the Author | Notebooks section of your AzureML Studio web interface."

Tips or important notes

Run numbers may be different in your executions. Every time you execute the cells, a new run number is created, continuing from the previous number. So, if you execute code that performs one hyperdrive run with 20 child runs, the last child run will be run 21. The next time you execute the same code, the hyperdrive run will start from run 22, and the last child will be run 42. The run numbers referred to in this section are the ones shown in the various figures, and it is normal to observe differences, especially if you had to rerun a couple of cells.