Book Image

Hands-On Machine Learning with Azure

By : Thomas K Abraham, Parashar Shah, Jen Stirrup, Lauri Lehman, Anindita Basak
Book Image

Hands-On Machine Learning with Azure

By: Thomas K Abraham, Parashar Shah, Jen Stirrup, Lauri Lehman, Anindita Basak

Overview of this book

Implementing Machine learning (ML) and Artificial Intelligence (AI) in the cloud had not been possible earlier due to the lack of processing power and storage. However, Azure has created ML and AI services that are easy to implement in the cloud. Hands-On Machine Learning with Azure teaches you how to perform advanced ML projects in the cloud in a cost-effective way. The book begins by covering the benefits of ML and AI in the cloud. You will then explore Microsoft’s Team Data Science Process to establish a repeatable process for successful AI development and implementation. You will also gain an understanding of AI technologies available in Azure and the Cognitive Services APIs to integrate them into bot applications. This book lets you explore prebuilt templates with Azure Machine Learning Studio and build a model using canned algorithms that can be deployed as web services. The book then takes you through a preconfigured series of virtual machines in Azure targeted at AI development scenarios. You will get to grips with the ML Server and its capabilities in SQL and HDInsight. In the concluding chapters, you’ll integrate patterns with other non-AI services in Azure. By the end of this book, you will be fully equipped to implement smart cognitive actions in your models.
Table of Contents (14 chapters)

The Microsoft cloud – Azure

Microsoft's mission is been to empower every person and organization on Earth to achieve more. Microsoft Azure is a cloud platform designed to help customers achieve the intelligent cloud and the intelligent edge. Their vision is to help customers infuse AI into every application, both in the cloud and on compute devices of all form factors. With this in mind, Microsoft has developed a wide set of tools that can help its customer build AI into their applications with ease.

The following table shows the different tools that can be used to develop end-to-end AI solutions with Azure. The Azure Service column indicates those services that are owned and managed by Microsoft (first-party services). The Azure Marketplace column indicates third-party services or implementations of Microsoft products on Azure virtual machines, Infrastructure as a service (IaaS):

Azure services that assist in AI solution building

The preceding table shows the different tools that can be used to develop end-to-end AI solutions on Azure. Due to the pace of the innovation of Azure, it is not easy to keep up with all the services and their updates.

One of the challenges that architects, developers, and data scientists face is picking the right Azure components for their solution.

Picking the right components for a full, end-to-end solution is outside the scope of this book. Instead, we will focus on just the AI-specific tools that a developer, data engineer, or data scientist will need to use for their solution.

Choosing AI tools on Azure

In this book, we will assume that the you have knowledge and experience of AI in general. The goal here is not to touch on the basics of the various kinds of AI or to choose the correct algorithm; we assume you have a good understanding of what algorithms to choose in order to solve a given business need.

The following diagram shows a decision tree that can help you choose the right Azure AI tools. It is not meant to be comprehensive; just a guide to the correct technology choices. There are a lot of options that cross over, and this was difficult to depict on this diagram. Also keep in mind that an efficient AI solution would leverage multiple tools in combination:

Decision tree guide to choosing AI tools on Azure

The preceding diagram shows a decision tree that helps users of Microsoft's AI platform. Starting from the top, the first question is whether you would like to Build your own models or consume pre-trained models. If you are building your own models, then it involves data scientists, data engineers, and developers at various stages of the process. In some use cases, developers prefer to just consume pre-trained models.

Cognitive Services/bots

Developers who would like to consume pre-trained AI models, typically use one of Microsoft's Cognitive Services. For those who are building conversational applications, a combination of Bot Framework and Cognitive Services is the recommended path. We will go into the details of Cognitive Services in Chapter 3, Cognitive Services, and Chapter 4, Bot Framework, but it is important to understand when to choose Cognitive Services.

Cognitive Services were built with the goal of giving developers the tools to rapidly build and deploy AI applications. Cognitive Services are pre-trained, customizable AI models that are exposed via APIs with accompanying SDKs and web services. They perform certain tasks, and are designed to scale based on the load against it. In addition, they are also designed to be compliant with security standards and other data isolation requirements. At the time of writing, there are broadly five types of Cognitive Services offered by Azure:

  • Knowledge
  • Language
  • Search
  • Speech
  • Vision

Knowledge services are focused on building data-based intelligence into your application. QnA Maker is one such service that helps drive a question-and-answer service with all kinds of structured and semi-structured content. Underneath, the service leverages multiple services in Azure. It abstracts all that complexity from the user and makes it easy to create and manage.

Language services are focused on building text-based intelligence into your application. The Language Understanding Intelligent Service, (LUIS) is one type of service that allows users to build applications that can understand natural conversation and pass on the context of the conversation, also known as NLP (short for Natural-language processing), to the requesting application.

Search services are focused on providing services that integrate very specialized search tools for your application. These services are based on Microsoft's Bing search engine, but can be customized in multiple ways to integrate search into enterprise applications. The Bing Entity Search service is one such API that returns information about entities that Bing determines are relevant to a user's query.

Speech services are focused on providing services that allow developers to integrate powerful speech-enabled features into their applications, such as dictation, transcription, and voice command control. The custom speech service enables developers to build customized language modules and acoustic models tailored to specific types of applications and user profiles.

Vision services provide a variety of vision-based intelligent APIs that work on images or videos. The Custom Vision Service can be trained to detect a certain class of images after it has been trained on all the possible classes that the application is looking for.

Each of these Cognitive Services has limitations in terms of their applicability to different situations. They also have limits on scalability, but they are well-designed to handle most enterprise-wide AI solutions. Covering the limits and applicability of the services is outside the scope of this book and is well documented.

Since updates occur on a monthly basis, it is best to refer to the Azure documentation to find the limits of these services.

As a developer, once you, knowingly or unknowingly, hit the limitations of Cognitive Services, the best option is to build your own models to meet your business requirements. Building your own AI models involves ingesting data, transforming it, performing feature engineering on it, training a model, and eventually, deploying the model. This can end up being an elaborate and time-consuming process, depending on the maturity of the organization's capabilities for the different tasks. Picking the right set of tools involves assessing that maturity during the different steps of the process and using a service that fits the organizational capabilities. Referring to the preceding diagram, the second question that gets asked of organizations that want to build their own AI models is related to the kind of experience they would like.

Azure Machine Learning Studio

Azure Machine Learning (Azure ML) Studio is the primary tool, purely a web-based GUI, to help build machine learning (ML) models. Azure ML Studio is an almost code-free environment that allows the user to build end-to-end ML solutions. It has Microsoft Research's proprietary algorithms built in, which can do most machine learning tasks with real simplicity. It can also embed Python or R code to enhance its functionality. One of the greatest features of Azure ML Studio is the ability to create a web service in a single click. The web service is exposed in the form of a REST endpoint that applications can send data to. In addition to the web service, an Excel spreadsheet is also created, which accesses the same web service and can be used to test the model's functionality and share it easily with end users.

At time of writing, the primary limitation with Azure ML Studio is the 10 GB limit on an experiment container. This limit will be explained in detail in Chapter 6, Scalable Computing for Data Science, but for now, it is sufficient to understand that Azure ML Studio is well-suited to training datasets that are in the 2 GB to 5 GB range. In addition, there are also limits to the amount of R and Python code that you can include in ML Studio, and its performance, which will be discussed in detail later.

ML Server

For a code-first experience, there are multiple tools available in the Microsoft portfolio. If organizations are looking to deploy on-premises (in addition to the cloud), the only option available is Machine Learning Server (ML Server). ML Server is an enterprise platform that supports both R and Python applications. It supports all the activities involved in the ML process end-to-end. ML Server was previously known as R Server and came about through Microsoft's acquisition of revolution analytics. Later, Python support was added to handle the variety of user preferences.

In ML Server, users can use any of the open source libraries as part of their solution. The challenge with a lot of the open source tooling is that it takes a lot of additional effort to get it to scale. Here, ML Server's RevoScaleR and revocalepy libraries provide that scalability for large datasets by efficiently managing data on disk and in memory. In terms of scalability, it is proven that ML Server can scale either itself or the compute engine. To scale ML Server, it is important to note that the only way is to scale up. In other words, this means that you create a single server with more/faster CPU, memory, and storage. It does not scale out by creating additional nodes of ML Server. To achieve scalability, ML Server also leverages the compute on the data engines with which it interacts. This is done by shifting the compute context to distributed compute, such as Spark or Hadoop. There is also the ability to shift the compute context to SQL Server with both R and Python so that the algorithms run natively on SQL Server without having to move the data to the compute platform.

The challenges with ML Server are mostly associated with the limitations surrounding R itself, since Python functionality is relatively new. ML Server needs to be fully managed by the user, so it adds an additional layer of management. The lack of scale-out features also poses a challenge in some situations.

Azure ML Services

Azure ML Services is a relatively new service on Azure that enhances productivity in the process of building AI solutions. Azure ML Services has different components. On the user's end, Azure ML Workbench is a tool that allows users to pull in data, transform it, build models, and run them against various kinds of compute. Workbench is a tool that users run on their local machines and connect to Azure ML services. Azure ML Services itself runs on Azure and consists of experimentation and model management for ML. The experimentation service keeps track of model testing, performance, and any other metrics you would like to track while building a model. The model management service helps manage the deployment of models and manages the overall life cycle of multiple models built by individual users or large teams.

When leveraging Azure ML Services, there are multiple endpoints that can act as engines for the services. At the time of writing, only Python-based endpoints are supported. SQL Server, with the introduction of built-in Python services, can act as an endpoint. This is beneficial, especially if the user has most of the data in SQL tables and wants to minimize data movement.

If you have leveraged Spark libraries for ML at scale on ML Services, then you can deploy to Spark-based solutions on Azure. Currently, these can be either Spark on HDInsight, or any other native implementation of Apache Spark (Cloudera, Hortonworks, and so on).

If the user has leveraged other Hadoop-based libraries to build ML Services, then those can be deployed to HDInsight or any of the Apache Hadoop implementations available on Azure.

Azure Batch is a service that provides large-scale, on-demand compute for applications that require such resources on an ad hoc or scheduled basis. The typical workflow for this use case involves the creation of a VM cluster, followed by the submission of jobs to the cluster. After the job is completed, the cluster is destroyed, and users do not pay for any compute afterward.

The Data Science Virtual Machine (DSVM) is a highly customized VM template built on either Linux or Windows. It comes pre-installed with a huge variety of curated data science tools and libraries. All the tools and libraries are configured to work straight out of the box with minimal effort. The DSVM has multiple applications, which we will cover in Chapter 7, Machine Learning Server, including utilization as a base image VM for Azure Batch.

One of the most highly scalable targets for running models built by Azure ML Services is to leverage containers through Docker and orchestration via Kubernetes. This is made easier by leveraging Azure Kubernetes Services (AKS). Azure ML Services creates a Docker image that helps operationalize an ML model. The model itself is deployed as containerized Docker-based web services, while leveraging frameworks such as TensorFlow, and Spark. Applications can access this web service as a REST API. The web services can be scaled up and down by leveraging the scaling features of Kubernetes. More details on this topic will be covered in Chapter 10, Building Deep Learning Solutions.

The challenge with Azure ML Services is that it currently only supports Python. The platform itself has gone through some changes, and the heavy reliance on the command-line interface makes the interface not as user-friendly as some other tools.

Azure Databricks

Azure Databricks is one of the newest additions to the tools that can be used to build custom AI solutions on Azure. It is based on Apache Spark, but is optimized for use on the Azure platform. The Spark engine can be accessed by various APIs that can be based on Scala, Python, R, SQL, or Java. To leverage the scalability of Spark, users need to leverage Spark libraries when dealing with data objects and their transformations. Azure Databricks leverages these scalable libraries on top of highly elastic and scalable Spark clusters that are managed by the runtime. Databricks comes with enterprise-grade security, compliance, and collaboration features that distinguish it from Apache Spark. The ability to schedule and orchestrate jobs is also a great feature to have, especially when automating and streamlining AI workflows. Spark is also a great, unified platform for performing different analytics: interactive querying, ML, stream processing, and graph computation.

The challenge with Azure Databricks is that it is relatively new in Azure and does not integrate directly with some services. Another challenge is that users who are new to Spark would have to refactor their code to incorporate Spark libraries, without which they cannot leverage the benefits of the highly distributed environment available.