Book Image

Mastering Azure Machine Learning

By : Christoph Körner, Kaijisse Waaijer
Book Image

Mastering Azure Machine Learning

By: Christoph Körner, Kaijisse Waaijer

Overview of this book

The increase being seen in data volume today requires distributed systems, powerful algorithms, and scalable cloud infrastructure to compute insights and train and deploy machine learning (ML) models. This book will help you improve your knowledge of building ML models using Azure and end-to-end ML pipelines on the cloud. The book starts with an overview of an end-to-end ML project and a guide on how to choose the right Azure service for different ML tasks. It then focuses on Azure Machine Learning and takes you through the process of data experimentation, data preparation, and feature engineering using Azure Machine Learning and Python. You'll learn advanced feature extraction techniques using natural language processing (NLP), classical ML techniques, and the secrets of both a great recommendation engine and a performant computer vision model using deep learning methods. You'll also explore how to train, optimize, and tune models using Azure Automated Machine Learning and HyperDrive, and perform distributed training on Azure. Then, you'll learn different deployment and monitoring techniques using Azure Kubernetes Services with Azure Machine Learning, along with the basics of MLOps—DevOps for ML to automate your ML process as CI/CD pipeline. By the end of this book, you'll have mastered Azure Machine Learning and be able to confidently design, build and operate scalable ML pipelines in Azure.
Table of Contents (20 chapters)
1
Section 1: Azure Machine Learning
4
Section 2: Experimentation and Data Preparation
9
Section 3: Training Machine Learning Models
15
Section 4: Optimization and Deployment of Machine Learning Models
19
Index

About Mastering Azure Machine Learning

The increase being seen in data volume today requires distributed systems, powerful algorithms, and scalable cloud infrastructure to compute insights and train and deploy machine learning (ML) models. This book will help you improve your knowledge of building ML models using Azure and end-to-end ML pipelines on the cloud.

The book starts with an overview of an end-to-end ML project and a guide on how to choose the right Azure service for different ML tasks. It then focuses on Azure Machine Learning and takes you through the process of data experimentation, data preparation, and feature engineering using Azure Machine Learning and Python. You'll learn advanced feature extraction techniques using natural language processing (NLP), classical ML techniques, and the secrets of both a great recommendation engine and a performant computer vision model using deep learning methods. You'll also explore how to train, optimize, and tune models using Azure Automated Machine Learning and HyperDrive, and perform distributed training on Azure. Then, you'll learn different deployment and monitoring techniques using Azure Kubernetes Services with Azure Machine Learning, along with the basics of MLOps—DevOps for ML to automate your ML process as CI/CD pipeline.

By the end of this book, you'll have mastered Azure Machine Learning and be able to confidently design, build and operate scalable ML pipelines in Azure.

About the authors

Christoph Körner recently worked as a cloud solution architect for Microsoft, specialising in Azure-based big data and machine learning solutions, where he was responsible to design end-to-end machine learning and data science platforms. For the last few months, he has been working as a senior software engineer at HubSpot, building a large-scale analytics platform. Before Microsoft, Christoph was the technical lead for big data at T-Mobile, where his team designed, implemented, and operated large-scale data analytics and prediction pipelines on Hadoop. He has also authored three books: Deep Learning in the Browser (for Bleeding Edge Press), Learning Responsive Data Visualization, and Data Visualization with D3 and AngularJS (both for Packt).

Kaijisse Waaijer is an experienced technologist specializing in data platforms, machine learning, and the Internet of Things. Kaijisse currently works for Microsoft EMEA as a data platform consultant specializing in data science, machine learning, and big data. She works constantly with customers across multiple industries as their trusted tech advisor, helping them optimize their organizational data to create better outcomes and business insights that drive value using Microsoft technologies. Her true passion lies within the trading systems automation and applying deep learning and neural networks to achieve advanced levels of prediction and automation.

About the reviewers

Alexey Bokov is an experienced Azure architect and Microsoft technical evangelist since 2011. He works closely with Microsoft's top-tier customers all around the world to develop applications based on the Azure cloud platform. Building cloud-based applications for challenging scenarios is his passion, along with helping the development community to upskill and learn new things through hands-on exercises and hacking. He's a long-time contributor to, and coauthor and reviewer of, many Azure books, and, from time to time, is a speaker at Kubernetes events.

Marek Chmel is a Sr. Cloud Solutions Architect at Microsoft for Data & Artificial Intelligence , speaker and trainer with more than 15 years' experience. He's a frequent conference speaker, focusing on SQL Server, Azure and security topics. He has been a Data Platform MVP since 2012 for 8 years. He has earned numerous certifications, including MCSE: Data Management and Analytics, Azure Architect, Data Engineer and Data Scientist Associate, EC Council Certified Ethical Hacker, and several eLearnSecurity certifications.

Marek earned his MSc degree in business and informatics from Nottingham Trent University. He started his career as a trainer for Microsoft Server courses and later worked as Principal SharePoint and Principal Database Administrator.

Learning objectives

By the end of this book, you will be able to:

  • Setup your Azure Machine Learning workspace for data experimentation and visualization
  • Perform ETL, data preparation, and feature extraction using Azure best practices
  • Implement advanced feature extraction using NLP and word embeddings
  • Train gradient boosted tree-ensembles, recommendation engines and deep neural networks on Azure Machine Learning
  • Use hyperparameter tuning and Azure Automated Machine Learning to optimize your ML models
  • Employ distributed ML on GPU clusters using Horovod in Azure Machine Learning
  • Deploy, operate and manage your ML models at scale
  • Automated your end-to-end ML process as CI/CD pipelines for MLOps

Audience

This machine learning book is for data professionals, data analysts, data engineers, data scientists, or machine learning developers who want to master scalable cloud-based machine learning architectures in Azure. This book will help you use advanced Azure services to build intelligent machine learning applications. A basic understanding of Python and working knowledge of machine learning are mandatory.

Approach

This book will cover all required steps for building and operating a large-scale machine learning pipeline on Azure in the same order as an actual machine learning project.

To get the most out of this book

Most code examples in this book require an Azure subscription to execute the code. You can create an Azure account for free and receive USD 200 of credits to use within 30 days using the sign-up page at https://azure.microsoft.com/free.

The easiest way to get started is by creating an Azure Machine Learning Workspace (Basic or Enterprise) and subsequently creating a Compute Instance of VM type STANDARD_D3_V2 in your workspace. The Compute Instance gives you access to a JupyterLab or Jupyter Notebook environment with all essential libraries pre-installed and works great for the authoring and execution of experiments.

Rather than running all experiments on Azure, you can also run some of the code examples—especially the authoring code—on your local machine. To do so, you need a Python runtime—preferably an interactive runtime such as JupyterLab or Jupyter Notebook—with the Azure Machine Learning SDK installed. We recommend using Python>=3.6.1.

Note

You can find more information about installing the SDK at https://docs.microsoft.com/python/api/overview/azure/ml/install?view= azure-ml-py

We will use the following library versions throughout the book if not stated otherwise. You can as well find a detailed description of all libraries used for each chapter in the Github repository for this book (link available in the Download resources section).

Preface Table

If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the Download resources section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

To get the most out of this book, you should have experience in programming in Python and have a basic understanding of popular ML and data manipulation libraries such as TensorFlow, Keras, Scikit, and Pandas.

Conventions

Code words in the text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows:

"The substring(start,length) expression can be used to extract a prefix from a column into a new column "

Here's a sample block of code:

for url in product_image_urls:
res = cs_vision_analyze(url, key, features=['Description']) caption = res['description']['captions'][0]['text']

On many occasions, we have used angled brackets, <>. You need to replace these with the actual parameter, and not use these brackets within the commands.

Download resources

The code bundle for this book is also hosted on GitHub at https://github.com/PacktPublishing/Mastering-Azure-Machine-Learning. You can find the YAML and other files used in this book, which are referred to at relevant instances.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing. Check them out!