Book Image

Automated Machine Learning

By : Adnan Masood
Book Image

Automated Machine Learning

By: Adnan Masood

Overview of this book

Every machine learning engineer deals with systems that have hyperparameters, and the most basic task in automated machine learning (AutoML) is to automatically set these hyperparameters to optimize performance. The latest deep neural networks have a wide range of hyperparameters for their architecture, regularization, and optimization, which can be customized effectively to save time and effort. This book reviews the underlying techniques of automated feature engineering, model and hyperparameter tuning, gradient-based approaches, and much more. You'll discover different ways of implementing these techniques in open source tools and then learn to use enterprise tools for implementing AutoML in three major cloud service providers: Microsoft Azure, Amazon Web Services (AWS), and Google Cloud Platform. As you progress, you’ll explore the features of cloud AutoML platforms by building machine learning models using AutoML. The book will also show you how to develop accurate models by automating time-consuming and repetitive tasks in the machine learning development lifecycle. By the end of this machine learning book, you’ll be able to build and deploy AutoML models that are not only accurate, but also increase productivity, allow interoperability, and minimize feature engineering tasks.
Table of Contents (15 chapters)
1
Section 1: Introduction to Automated Machine Learning
5
Section 2: AutoML with Cloud Platforms
12
Section 3: Applied Automated Machine Learning

Introducing TPOT

The Tree-based Pipeline Optimization Tool, or TPOT for short, is a product of the University of Pennsylvania's, Computational Genetics Lab. TPOT is an automated ML tool written in Python. It helps build and optimize ML pipelines with genetic programming. Built on top of scikit-learn, TPOT helps automate the feature selection, preprocessing, construction, model selection, and parameter optimization processes by "exploring thousands of possible pipelines to find the best one". It is one of the only toolkits with a short learning curve.

The toolkit is available on GitHub to be downloaded: github.com/EpistasisLab/tpot.

To explain the framework, let's start with a minimal working example. For this example, we will be using the MNIST database of handwritten digits:

  1. Create a new Colab notebook and run pip install TPOT. TPOT can be directly used from the command line or via Python code:

    Figure 3.3 – Installing TPOT on a Colab notebook...