Book Image

Machine Learning Engineering with MLflow

By : Natu Lauchande
2 (1)
Book Image

Machine Learning Engineering with MLflow

2 (1)
By: Natu Lauchande

Overview of this book

MLflow is a platform for the machine learning life cycle that enables structured development and iteration of machine learning models and a seamless transition into scalable production environments. This book will take you through the different features of MLflow and how you can implement them in your ML project. You will begin by framing an ML problem and then transform your solution with MLflow, adding a workbench environment, training infrastructure, data management, model management, experimentation, and state-of-the-art ML deployment techniques on the cloud and premises. The book also explores techniques to scale up your workflow as well as performance monitoring techniques. As you progress, you’ll discover how to create an operational dashboard to manage machine learning systems. Later, you will learn how you can use MLflow in the AutoML, anomaly detection, and deep learning context with the help of use cases. In addition to this, you will understand how to use machine learning platforms for local development as well as for cloud and managed environments. This book will also show you how to use MLflow in non-Python-based languages such as R and Java, along with covering approaches to extend MLflow with Plugins. By the end of this machine learning book, you will be able to produce and deploy reliable machine learning algorithms using MLflow in multiple environments.
Table of Contents (18 chapters)
1
Section 1: Problem Framing and Introductions
4
Section 2: Model Development and Experimentation
8
Section 3: Machine Learning in Production
13
Section 4: Advanced Topics

Implementing the training job

We will use the training data produced in the previous chapter. The assumption here is that an independent job populates the data pipeline in a specific folder. In the book's GitHub repository, you can look at the data in https://github.com/PacktPublishing/Machine-Learning-Engineering-with-MLflow/blob/master/Chapter08/psystock-training/data/training/data.csv.

We will now create a train_model.py file that will be responsible for loading the training data to fit and produce a model. Test predictions will be produced and persisted in the environment so that other steps of the workflow can use the data to evaluate the model.

The file produced in this section is available at the following link:

https://github.com/PacktPublishing/Machine-Learning-Engineering-with-MLflow/blob/master/Chapter08/psystock-training/train_model.py:

  1. We will start by importing the relevant packages. In this case, we will need pandas to handle the data, xgboost...