-
Book Overview & Buying
-
Table Of Contents
Reproducible Data Science with Pachyderm
By :
This section defines Machine Learning Operations (MLOps) and describes why it is crucial to establish a reliable MLOps process within your data science department.
In many organizations, data science departments have been created fairly recently, in the last few years. The profession of data scientist is fairly new as well. Therefore, many of these departments have to find a way to integrate into the existing corporate process and devise ways to ensure the reliability and scalability of data science deliverables.
In many cases, the burden of building a suitable infrastructure falls on the shoulders of the data scientists themselves, who often are not as familiar with the latest infrastructure trends. Another problem is how to make it all work for different languages, platforms, and environments. In the end, data scientists spend more time on building the infrastructure than on working on the model itself. This is where the new discipline has emerged to help bridge the gap between data science and infra.
MLOps is a lifecycle process that identifies the stages of machine learning operations, ensuring the reliability of the data science process. MLOps is a set of practices that define the machine learning development process. Although the term was coined fairly recently, most data scientists agree that a successful MLOps process should adhere to the following principles:
Before we dive into the MLOps process stages, let's take a look at more established software development practices. DevOps is a software development practice that is used in many enterprise-level software projects. A typical DevOps lifecycle includes the following stages that continuously repeat, ensuring product improvement:
The following diagram illustrates the DevOps lifecycle:
Figure 1.5 – DevOps Lifecycle
All these phases are continuously repeated, enabling communication between departments and a customer feedback loop. This practice has brought enterprises such benefits as a faster development cycle, better products, and continuous innovation. Better teamwork enabled by the close relationships between departments is one of the key factors that make this process efficient.
Data scientists deserve a process that brings the same level of reliability. One of the biggest problems of enterprise data science is that very few machine learning models make it to production. Many companies are just starting to adopt data science, and the new departments face unprecedented challenges. Often, the teams lack an understanding of the workflows that need to be implemented in order to make enterprise-level data science work.
Another important challenge is that unlike in traditional software development, data scientists operate not only with code but also with data and parameters. Data is taken from the real world, and the code is accurately developed in the office. The only time they cross is when they are combined in a data model.
The challenges that all data science departments face include the following:
In many enterprises, data science departments are still small and struggle to create a reliable workflow. Building such a process requires certain expertise, such as an understanding of traditional software practices, such as DevOps, mixed with an understanding of data science challenges. That is where MLOps started to emerge, combining data science with best practices of software development.
If we try to apply similar DevOps practices to data science, here is what we might see:
Similar to DevOps, the stages of MLOps are constantly repeated. The following diagram shows the stages of MLOps:
Figure 1.6 – MLOps Lifecycle
As you can see, the two practices are very similar, and the latter borrows the main concepts from the former. Using MLOps in practice has brought the following advantages to enterprise-level data science:
In this section, we've learned about the important stages of the MLOps process. In the next section, we will learn more about the types of data science platforms that can help you implement MLOps in your organization.