Book Image

Agile Machine Learning with DataRobot

By : Bipin Chadha, Sylvester Juwe
Book Image

Agile Machine Learning with DataRobot

By: Bipin Chadha, Sylvester Juwe

Overview of this book

DataRobot enables data science teams to become more efficient and productive. This book helps you to address machine learning (ML) challenges with DataRobot's enterprise platform, enabling you to extract business value from data and rapidly create commercial impact for your organization. You'll begin by learning how to use DataRobot's features to perform data prep and cleansing tasks automatically. The book then covers best practices for building and deploying ML models, along with challenges faced while scaling them to handle complex business problems. Moving on, you'll perform exploratory data analysis (EDA) tasks to prepare your data to build ML models and ways to interpret results. You'll also discover how to analyze the model's predictions and turn them into actionable insights for business users. Next, you'll create model documentation for internal as well as compliance purposes and learn how the model gets deployed as an API. In addition, you'll find out how to operationalize and monitor the model's performance. Finally, you'll work with examples on time series forecasting, NLP, image processing, MLOps, and more using advanced DataRobot capabilities. By the end of this book, you'll have learned to use DataRobot's AutoML and MLOps features to scale ML model building by avoiding repetitive tasks and common errors.
Table of Contents (19 chapters)
1
Section 1: Foundations
5
Section 2: Full ML Life Cycle with DataRobot: Concept to Value
11
Section 3: Advanced Topics

Connecting to data sources

By this point, you should have a list of data sources and an idea of what data is stored there. Depending on your use case, these sources could be real-time data streaming sources you need to tap into. Here are some typical sources of data:

  • Filesystems
  • Excel files
  • SQL databases
  • Amazon S3 buckets
  • Hadoop Distributed File System (HDFS)
  • NoSQL databases
  • Data warehouses
  • Data lakes
  • Graph databases
  • Data streams

Depending on the type of data source, you will use different mechanisms to access this data. These could be on-premises or in the cloud. Depending on the condition of the data, you can bring it directly into DataRobot, or you might have to do some preparation before you bring it into DataRobot. DataRobot has recently added capabilities in the form of Paxata to help with this process, but you might not have access to that add-on. Most of the processing work is done via SQL, Python, pandas, and Excel. For the...