Book Image

Distributed Data Systems with Azure Databricks

By : Alan Bernardo Palacio
Book Image

Distributed Data Systems with Azure Databricks

By: Alan Bernardo Palacio

Overview of this book

Microsoft Azure Databricks helps you to harness the power of distributed computing and apply it to create robust data pipelines, along with training and deploying machine learning and deep learning models. Databricks' advanced features enable developers to process, transform, and explore data. Distributed Data Systems with Azure Databricks will help you to put your knowledge of Databricks to work to create big data pipelines. The book provides a hands-on approach to implementing Azure Databricks and its associated methodologies that will make you productive in no time. Complete with detailed explanations of essential concepts, practical examples, and self-assessment questions, you’ll begin with a quick introduction to Databricks core functionalities, before performing distributed model training and inference using TensorFlow and Spark MLlib. As you advance, you’ll explore MLflow Model Serving on Azure Databricks and implement distributed training pipelines using HorovodRunner in Databricks. Finally, you’ll discover how to transform, use, and obtain insights from massive amounts of data to train predictive models and create entire fully working data pipelines. By the end of this MS Azure book, you’ll have gained a solid understanding of how to work with Databricks to create and manage an entire big data pipeline.
Table of Contents (17 chapters)
1
Section 1: Introducing Databricks
4
Section 2: Data Pipelines with Databricks
9
Section 3: Machine and Deep Learning with Databricks

Serving models with MLflow

One of the benefits of using MLflow in Azure Databricks as the repository of our machine learning models is that it allows us to simply serve predictions from the Model Registry as REST API endpoints. These endpoints are updated automatically on newer versions of the models in each one of the stages, therefore this is a complementary feature of keeping track of the model's lifecycle using the MLflow Model Registry.

Enabling a model to be served as a REST API endpoint can be done from the Model Registry UI in the Azure workspace. To enable a model to be served, go to the model page in the Model Registry UI and click on the Enable Serving button in the Serving tab.

Once you have clicked on the button, which is shown in the following screenshot, you should see the status as Pending. After a couple of minutes, the status will change to Ready:

Figure 11.9 – Enabling the serving of a model

If you want to disable...