Book Image

Engineering MLOps

By : Emmanuel Raj
Book Image

Engineering MLOps

By: Emmanuel Raj

Overview of this book

Engineering MLps presents comprehensive insights into MLOps coupled with real-world examples in Azure to help you to write programs, train robust and scalable ML models, and build ML pipelines to train and deploy models securely in production. The book begins by familiarizing you with the MLOps workflow so you can start writing programs to train ML models. Then you’ll then move on to explore options for serializing and packaging ML models post-training to deploy them to facilitate machine learning inference, model interoperability, and end-to-end model traceability. You’ll learn how to build ML pipelines, continuous integration and continuous delivery (CI/CD) pipelines, and monitor pipelines to systematically build, deploy, monitor, and govern ML solutions for businesses and industries. Finally, you’ll apply the knowledge you’ve gained to build real-world projects. By the end of this ML book, you'll have a 360-degree view of MLOps and be ready to implement MLOps in your organization.
Table of Contents (18 chapters)
1
Section 1: Framework for Building Machine Learning Models
7
Section 2: Deploying Machine Learning Models at Scale
13
Section 3: Monitoring Machine Learning Models in Production

Data preprocessing

Raw data cannot be directly passed to the ML model for training purposes. We have to refine or preprocess the data before training the ML model. To further analyze the imported data, we will perform a series of steps to preprocess the data into a suitable shape for the ML training. We start by assessing the quality of the data to check for accuracy, completeness, reliability, relevance, and timeliness. After this, we calibrate the required data and encode text into numerical data, which is ideal for ML training. Lastly, we will analyze the correlations and time series, and filter out irrelevant data for training ML models.

Data quality assessment

To assess the quality of the data, we look for accuracy, completeness, reliability, relevance, and timeliness. Firstly, let's check if the data is complete and reliable by assessing the formats, cumulative statistics, and anomalies such as missing data. We use pandas functions as follows:

df.describe(...