Book Image

Azure Data Scientist Associate Certification Guide

By : Andreas Botsikas, Michael Hlobil
Book Image

Azure Data Scientist Associate Certification Guide

By: Andreas Botsikas, Michael Hlobil

Overview of this book

The Azure Data Scientist Associate Certification Guide helps you acquire practical knowledge for machine learning experimentation on Azure. It covers everything you need to pass the DP-100 exam and become a certified Azure Data Scientist Associate. Starting with an introduction to data science, you'll learn the terminology that will be used throughout the book and then move on to the Azure Machine Learning (Azure ML) workspace. You'll discover the studio interface and manage various components, such as data stores and compute clusters. Next, the book focuses on no-code and low-code experimentation, and shows you how to use the Automated ML wizard to locate and deploy optimal models for your dataset. You'll also learn how to run end-to-end data science experiments using the designer provided in Azure ML Studio. You'll then explore the Azure ML Software Development Kit (SDK) for Python and advance to creating experiments and publishing models using code. The book also guides you in optimizing your model's hyperparameters using Hyperdrive before demonstrating how to use responsible AI tools to interpret and debug your models. Once you have a trained model, you'll learn to operationalize it for batch or real-time inferences and monitor it in production. By the end of this Azure certification study guide, you'll have gained the knowledge and the practical skills required to pass the DP-100 exam.
Table of Contents (17 chapters)
1
Section 1: Starting your cloud-based data science journey
6
Section 2: No code data science experimentation
9
Section 3: Advanced data science tooling and capabilities

What this book covers

Chapter 1, An Overview of Modern Data Science, provides you with the terminology used throughout the book.

Chapter 2, Deploying Azure Machine Learning Workspace Resources, helps you understand the deployment options for an Azure Machine Learning (AzureML) workspace.

Chapter 3, Azure Machine Learning Studio Components, provides an overview of the studio web interface you will be using to conduct your data science experiments.

Chapter 4, Configuring the Workspace, helps you understand how to provision computational resources and connect to data sources that host your datasets.

Chapter 5, Letting the Machines Do the Model Training, guides you on your first Automated Machine Learning (AutoML) experiment and how to deploy the best-trained model as a web endpoint through the studio’s wizards.

Chapter 6, Visual Model Training and Publishing, helps you author a training pipeline through the studio’s designer experience. You will learn how to operationalize the trained model through a batch or a real-time pipeline by promoting the trained pipeline within the designer.

Chapter 7, The AzureML Python SDK, gets you started on the code-first data science experimentation. You will understand how the AzureML Python SDK is structured, and you will learn how to manage AzureML resources like compute clusters with code.

Chapter 8, Experimenting with Python Code, helps you train your first machine learning model with code. It guides you on how to track model metrics and scale-out your training efforts to bigger compute clusters.

Chapter 9, Optimizing the ML Model, shows you how to optimize your machine learning model with Hyperparameter tuning and helps you discover the best model for your dataset by kicking off an AutoML experiment with code.

Chapter 10, Understanding Model Results, introduces you to the concept of responsible AI and deep dives into the tools that allow you to interpret your models’ predictions, analyze the errors that your models are prone to, and detect potential fairness issues.

Chapter 11, Working with Pipelines, guides you on authoring repeatable processes by defining multi-step pipelines using the AzureML Python SDK.

Chapter 12, Operationalizing Models with Code, helps you register your trained models and operationalize them through real-time web endpoints or batch parallel processing pipelines.