Book Image

Agile Machine Learning with DataRobot

By : Bipin Chadha, Sylvester Juwe
Book Image

Agile Machine Learning with DataRobot

By: Bipin Chadha, Sylvester Juwe

Overview of this book

DataRobot enables data science teams to become more efficient and productive. This book helps you to address machine learning (ML) challenges with DataRobot's enterprise platform, enabling you to extract business value from data and rapidly create commercial impact for your organization. You'll begin by learning how to use DataRobot's features to perform data prep and cleansing tasks automatically. The book then covers best practices for building and deploying ML models, along with challenges faced while scaling them to handle complex business problems. Moving on, you'll perform exploratory data analysis (EDA) tasks to prepare your data to build ML models and ways to interpret results. You'll also discover how to analyze the model's predictions and turn them into actionable insights for business users. Next, you'll create model documentation for internal as well as compliance purposes and learn how the model gets deployed as an API. In addition, you'll find out how to operationalize and monitor the model's performance. Finally, you'll work with examples on time series forecasting, NLP, image processing, MLOps, and more using advanced DataRobot capabilities. By the end of this book, you'll have learned to use DataRobot's AutoML and MLOps features to scale ML model building by avoiding repetitive tasks and common errors.
Table of Contents (19 chapters)
1
Section 1: Foundations
5
Section 2: Full ML Life Cycle with DataRobot: Concept to Value
11
Section 3: Advanced Topics

What this book covers

Chapter 1, What Is DataRobot and Why You Need It, describes the current practices and process of building and deploying ML models, and some of the challenges in scaling that approach. This chapter will then describe what DataRobot is and how DataRobot addresses many of these challenges, thus allowing analysts and data scientists to quickly add value to their organization. This also helps executives understand how they can use DataRobot to efficiently scale their data science practice without a need to hire a large staff with hard-to-find skills. This chapter also describes various components of DataRobot, how it is architected, how it integrates with other tools, and different options to set it up on-premises or in the cloud. It also describes, at a high level, various user interface components and what they signify.

Chapter 2, Machine Learning Basics, covers some basic concepts of ML that will be used and referenced in this book. This is the bare minimum you need to know to use DataRobot effectively. It is not the intent of this chapter to give you a comprehensive understanding of ML, but just a refresher of some key ideas.

Chapter 3, Understanding and Defining Business Problems, will show you examples of how to get to the root of a problem and then set it up as an ML project. A business problem needs to be carefully defined and turned into an ML problem for it to be solved with DataRobot. This is a critical step that is often ignored, resulting in problems and failures downstream. Please review this chapter carefully to prevent the wastage of a lot of hard work. This chapter is tool- and ML method-agnostic.

Chapter 4, Preparing Data for DataRobot, covers how to stitch data together from multiple disparate sources at a high level. Depending on the data, DataRobot might perform data prep and cleansing tasks automatically, or you might have to do some of these on your own. This chapter covers concepts and examples to show how to cleanse and prepare your data and the features that DataRobot provides to help with these tasks.

Chapter 5, Exploratory Data Analysis with DataRobot, will show you how to use DataRobot to perform various data analyses and get data ready to start building models. We provide detailed examples of the kinds of analysis that should be done and what to be aware of to prevent issues downstream. Done right, this analysis can help catch data problems and also generate useful business insights.

Chapter 6, Model Building with DataRobot, shows step-by-step examples of building different types of models with DataRobot. We cover details such as what settings to use under different circumstances, how to select specific model types, setting up cross validation, building ensemble models, and tracking the top-performing models on the leaderboard.

Chapter 7, Model Understanding and Explainability, will show you examples of various functions and outputs that DataRobot provides to help you understand the models and select the one that best solves the business problem. In this chapter, we will cover, via examples, what aspects you need to watch out for, and the trade-offs you have to make in model selection.

Chapter 8, Model Scoring and Deployment, covers how to use models to score input datasets, create predictions to be used in the intended applications, deploy models in production, and monitor models.

Chapter 9, Forecasting and Time Series Modeling, describes how you go about building time series models. These types of models are typically used for forecasting applications. The chapter shows examples of how different time series problems are handled with DataRobot. We cover single- as well as multi-series problems.

Chapter 10, Recommender Systems, covers examples of how you go about building recommender systems with DataRobot. These types of models are typically used for recommending products or services to users. The chapter covers the strategies and functionality differences in how a recommendation problem is handled with DataRobot. We cover trade-offs associated with building different recommender models.

Chapter 11, Working with Geospatial Data, NLP, and Image Processing, covers various DataRobot functions relating to visualization and analysis of geospatial, text, and image features, as well as building ML models that incorporate such features. This chapter describes DataRobot capabilities to automatically incorporate text and image data into ML models, thereby improving the performance of these models.

Chapter 12, DataRobot Python API, describes when and how to use the DataRobot Python API. While DataRobot automates many aspects of model building, there are many scenarios where you need to use programming languages such as Python to efficiently and scalably perform ML tasks. DataRobot provides a convenient API that allows experienced data scientists to execute DataRobot functions programmatically.

Chapter 13, Model Governance and MLOps, covers some recent topics that are beginning to get a lot of attention. Once a model has been developed and deployed, it needs to be governed and maintained over time. While this is similar to an IT system in many ways, there are some critical differences that need to be understood and operationalized. This chapter covers several features and functions that DataRobot provides to assist in governing and maintaining ML models.

Chapter 14, Conclusion, covers where to go for additional information and other topics that might be outside the scope of this book. We also describe where we see automated ML and DataRobot heading in the future.