Agile Machine Learning with DataRobot

By : Bipin Chadha, Sylvester Juwe

Agile Machine Learning with DataRobot

By: Bipin Chadha, Sylvester Juwe

Overview of this book

DataRobot enables data science teams to become more efficient and productive. This book helps you to address machine learning (ML) challenges with DataRobot's enterprise platform, enabling you to extract business value from data and rapidly create commercial impact for your organization. You'll begin by learning how to use DataRobot's features to perform data prep and cleansing tasks automatically. The book then covers best practices for building and deploying ML models, along with challenges faced while scaling them to handle complex business problems. Moving on, you'll perform exploratory data analysis (EDA) tasks to prepare your data to build ML models and ways to interpret results. You'll also discover how to analyze the model's predictions and turn them into actionable insights for business users. Next, you'll create model documentation for internal as well as compliance purposes and learn how the model gets deployed as an API. In addition, you'll find out how to operationalize and monitor the model's performance. Finally, you'll work with examples on time series forecasting, NLP, image processing, MLOps, and more using advanced DataRobot capabilities. By the end of this book, you'll have learned to use DataRobot's AutoML and MLOps features to scale ML model building by avoiding repetitive tasks and common errors.

Preface

Who this book is for

What this book covers

To get the most out of this book

Code in Action

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Section 1: Foundations

Free Chapter

Chapter 1: What Is DataRobot and Why You Need It?

Technical requirements

Data science processes for generating business value

Challenges associated with data science

DataRobot architecture

Navigating and using DataRobot features

Addressing data science challenges with DataRobot

Summary

Chapter 2: Machine Learning Basics

Data preparation

Data visualization

Machine learning algorithms

Performance metrics

Understanding the results

Summary

Chapter 3: Understanding and Defining Business Problems

Understanding the system context

Understanding the why and the how

Getting to the root of the business problem

Defining the ML problem

Determining predictions, actions, and consequences for Responsible AI

Operationalizing and generating value

Summary

Further reading

Section 2: Full ML Life Cycle with DataRobot: Concept to Value

Chapter 4: Preparing Data for DataRobot

Technical requirements

Connecting to data sources

Aggregating data for modeling

Cleansing the dataset

Working with different types of data

Engineering features for modeling

Summary

Chapter 5: Exploratory Data Analysis with DataRobot

Data ingestion and data cataloging

Data quality assessment

EDA

Setting the target feature and correlation analysis

Feature selection

Summary

Chapter 6: Model Building with DataRobot

Configuring a modeling project

Building models and the model leaderboard

Understanding model blueprints

Building ensemble models

Summary

Chapter 7: Model Understanding and Explainability

Reviewing and understanding model details

Assessing model performance and metrics

Generating model explanations

Understanding model learning curves and trade-offs

Summary

Chapter 8: Model Scoring and Deployment

Scoring and prediction methods

Generating prediction explanations

Analyzing predictions and postprocessing

Deploying DataRobot models

Monitoring deployed models

Summary

Section 3: Advanced Topics

Chapter 9: Forecasting and Time Series Modeling

Technical requirements

Conceptual introduction to time series forecasting modeling

Defining and setting up time series projects

Building time series forecasting models and understanding their model outcomes

Making predictions with time series models

Advanced topics in time series modeling

Summary

Chapter 10: Recommender Systems

Technical requirements

A conceptual introduction to recommender systems

Approaches to building recommender systems

Defining and setting up recommender systems in DataRobot

Building recommender systems in DataRobot

Making recommender system predictions with DataRobot

Summary

Chapter 11: Working with Geospatial Data, NLP, and Image Processing

Technical requirements

A conceptual introduction to geospatial, text, and image data

Defining and setting up multimodal data in DataRobot

Building models using multimodal datasets in DataRobot

Making predictions using a multimodal dataset on DataRobot

Summary

Chapter 12: DataRobot Python API

Technical requirements

Accessing the DataRobot API

Using the DataRobot Python client

Building models programmatically

Making predictions programmatically

Summary

Chapter 13: Model Governance and MLOps

Technical requirements

Governing models

Addressing model bias and fairness

Implementing MLOps

Notifications and changing models in production

Summary

Chapter 14: Conclusion

Finding out additional information about DataRobot

Future of automated machine learning

Future of DataRobot

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Data science processes for generating business value

Data science is an emerging practice that has seen a lot of hype. Much of what it means is under debate and the practice is evolving rapidly. Regardless of these debates, there is no doubt that data science methods can provide business benefits if used properly. While following a process is no guarantee of success, it can certainly improve the odds of success and allow for improvement. Data science processes are inherently iterative, and it is important to not get stuck in a specific step for too long. People looking for predictable and predetermined timelines and results are bound to be disappointed. By all means, create a plan, but be ready to be nimble and agile as you proceed. A data science project is also a discovery project: you are never sure of what you will find. Your expectations or your hypotheses might turn out to be false and you might uncover interesting insights from unexpected sources.

There are many known applications of data science and new ones are being discovered every day. Some example applications are listed here:

Predicting which customer is most likely to buy a product
Predicting which customer will come back
Predicting what a customer will want next
Predicting which customer might default on a loan
Predicting which customer is likely to have an accident
Predicting which component of a machine might fail
Forecasting how many items will be sold in a store
Forecasting how many calls the call center will receive tomorrow
Forecasting how much energy will be consumed next month

Figure 1.1 shows a high-level process that describes how a data science project might go from concept to value generation:

Figure 1.1 – Typical process steps with details about what happens during each step

Following these steps is critical for a successful machine learning project. Sometimes these steps get skipped due to deadlines or issues that inevitably surface during development and debugging. We will show how using DataRobot helps you avoid some of the problems and ensure that your teams are following best practices. These steps will be covered in great detail, with examples, in other chapters of this book, but let's get familiar with them at a high level.

Problem understanding

This is perhaps the most important step and also the step that is given the least attention. Most data science projects fail because this step is rushed. This is also the task where you have the least methods and tools available from the data science disciplines. This step involves the following:

Understanding the business problem from a systemic perspective
Understanding what it is that the end users or consumers of the model's results expect
Understanding what the stakeholders will do with the results
Understanding what the potential sources of data are and how the data is captured and modified before it reaches you
Assessing whether there are any legal concerns regarding the use of data and data sources
Developing a detailed understanding of what various features of the datasets mean

Data preparation

This step is well known in the data science community as data science teams typically spend most of their time in this step. This is a task where DataRobot's capabilities start coming into play, but not completely. There is still a lot of work that the data science or data engineering teams have to do using SQL, Python, or R. There are also many tasks in this step that require a data scientist's skill and experience (for example, feature engineering), even though DataRobot is beginning to provide capabilities in this area. For example, DataRobot provides a lot of useful data visualizations and notifications about data quality, but it is up to the analyst to make sense out of them and take appropriate actions.

This step also involves defining the expected result (such as predicting how many items will be sold next week or determining the probability of default on a loan) of the model and how the quality of results will be measured during model development, validation, and testing stages.

Model development

This step involves the development of several models using different algorithms and optimizing or tuning hyperparameters of the algorithms. Results produced by the models are then evaluated to narrow down the model list, potentially drop some of the features, and fine-tune the hyperparameters.

It is also common to look at feature effects, feature importance, and partial dependence plots to engineer additional features. Once you are satisfied with the results, you start thinking about how to turn the predictions and explanations into useable and actionable information.

Model deployment

Upon completion of model development, the model results are reviewed with users and stakeholders. This is the point at which you should carefully assess how the results will be turned into actions. What will the consequences of those actions be, and are there any unintended consequences that could emerge? This is also the time to assess any fairness or bias issues resulting from the models. Make sure to discuss any concerns with the users and business leaders.

DataRobot provides several mechanisms to rapidly deploy the models as REST APIs or executable Java objects that can be deployed anywhere in the organization's infrastructure or in the cloud. Once the model is operational as an API, the hard part of change management starts. Here you have to make sure that the organization is ready for the change associated with the new way of doing business. This is typically hard on people who are used to doing things a certain way. Communicating why this is necessary, why it is better, and how to perform new functions are important aspects that frequently get missed.

Model maintenance

Once the model is successfully deployed and operating, the focus shifts to managing the model operations and maintenance. This includes identifying data gaps and other recommendations to improve the model over time as well as refining and retraining the models as needed. Monitoring involves evaluating incoming data to see whether the data has drifted and whether the drift requires action, monitoring the health of the prediction services, and monitoring the results and accuracy of the model outputs. It is also important to periodically meet with users to understand what the model does well and where it can be improved. It is also common to sometimes employ champion and challenger models to see whether a different model is able to perform better in the production setting.

As we outlined before, although these steps are presented in a linear fashion, in practice these steps do not occur in this exact sequence and there is typically plenty of iteration before you get to the final result. ML model development is a challenging process, and we will now discuss what some of the challenges are and how to address them.

Agile Machine Learning with DataRobot

By : Bipin Chadha, Sylvester Juwe

Agile Machine Learning with DataRobot

By: Bipin Chadha, Sylvester Juwe

Overview of this book

Related Content you might be interested in

Current Title:

Agile Machine Learning with DataRobot

The Deep Learning Architect’s Handbook

Machine Learning at Scale with H2O

TensorFlow Developer Certificate Guide

Data science processes for generating business value

Problem understanding

Data preparation

Model development

Model deployment

Model maintenance