Book Image

Automated Machine Learning

By : Adnan Masood
Book Image

Automated Machine Learning

By: Adnan Masood

Overview of this book

Every machine learning engineer deals with systems that have hyperparameters, and the most basic task in automated machine learning (AutoML) is to automatically set these hyperparameters to optimize performance. The latest deep neural networks have a wide range of hyperparameters for their architecture, regularization, and optimization, which can be customized effectively to save time and effort. This book reviews the underlying techniques of automated feature engineering, model and hyperparameter tuning, gradient-based approaches, and much more. You'll discover different ways of implementing these techniques in open source tools and then learn to use enterprise tools for implementing AutoML in three major cloud service providers: Microsoft Azure, Amazon Web Services (AWS), and Google Cloud Platform. As you progress, you’ll explore the features of cloud AutoML platforms by building machine learning models using AutoML. The book will also show you how to develop accurate models by automating time-consuming and repetitive tasks in the machine learning development lifecycle. By the end of this machine learning book, you’ll be able to build and deploy AutoML models that are not only accurate, but also increase productivity, allow interoperability, and minimize feature engineering tasks.
Table of Contents (15 chapters)
1
Section 1: Introduction to Automated Machine Learning
5
Section 2: AutoML with Cloud Platforms
12
Section 3: Applied Automated Machine Learning

Open source platforms and tools

In this section, we will briefly review some of the open source automated ML platforms and tools that are available. We will deep dive into some of these platforms in Chapter 3, Automated Machine Learning with Open Source Tools and Libraries.

Microsoft NNI

Microsoft Neural Network Intelligence (NNI) is an open source platform that addresses the three key areas of any automated ML life cycle – automated feature engineering, architectural search (also referred to as neural architectural search or NAS), and hyperparameter tunning (HPI). The toolkit also offers model compression features and operationalization via KubeFlow, Azure ML, DL Workspace (DLTS), and Kubernetes over AWS.

The toolkit is available on GitHub to be downloaded: https://github.com/microsoft/nni.

auto-sklearn

Scikit-learn (also known as sklearn) is a popular ML library for Python development. As part of this ecosystem and based on Efficient and Robust Automated ML by Feurer et al., auto-sklearn is an automated ML toolkit that performs algorithm selection and hyperparameter tuning using Bayesian optimization, meta-learning, and ensemble construction.

The toolkit is available on GitHub to be downloaded: github.com/automl/auto-sklearn.

Auto-Weka

Weka, short for Waikato Environment for Knowledge Analysis, is an open source ML library that provides a collection of visualization tools and algorithms for data analysis and predictive modeling. Auto-Weka is similar to auto-sklearn but is built on top of Weka and implements the approaches described in the paper for model selection, hyperparameter optimization, and more.

The developers describe Auto-WEKA as going beyond selecting a learning algorithm and setting its hyperparameters in isolation. Instead, it implements a fully automated approach. The author's intent is for Auto-WEKA "to help non-expert users to more effectively identify ML algorithms" – that is, democratization for SMEs – via "hyperparameter settings appropriate to their applications".

The toolkit is available on GitHub to be downloaded: github.com/automl/autoweka.

Auto-Keras

Keras is one of the most widely used deep learning frameworks and is an integral part of the TensorFlow 2.0 ecosystem. Auto-Keras, based on the paper by Jin et al., proposes that it is "a novel method for efficient neural architecture search with network morphism, enabling Bayesian optimization". This helps the neural architectural search "by designing a neural network kernel and algorithm for optimizing acquisition functions in a tree-structured space". Auto-Keras is the implementation of this deep learning architecture search via Bayesian optimization.

The toolkit is available on GitHub to be downloaded: github.com/jhfjhfj1/autokeras.

TPOT

The Tree-based Pipeline Optimization Tool, or TPOT for short (nice acronym, eh!), is a product of University of Pennsylvania, Computational Genetics Lab. TPOT is an automated ML tool written in Python. It helps build and optimize ML pipelines with genetic programming. Built on top of scikit-learn, TPOT helps automate feature selection, preprocessing, construction, model selection, and parameter optimization by "exploring thousands of possible pipelines to find the best one". It is just one of the many toolkits with a small learning curve.

The toolkit is available on GitHub to be downloaded: github.com/EpistasisLab/tpot.

Ludwig – a code-free AutoML toolbox

Uber's automated ML tool, Ludwig, is an open source deep learning toolbox used for experimentation, testing, and training ML models. Built on top of TensorFlow, Ludwig enables users to create model baselines and perform automated ML-style experiments with different network architectures and models. In its latest release (at the time of writing), Ludwig now integrates with CometML and supports BERT text encoders.

The toolkit is available on GitHub to be downloaded: https://github.com/uber/ludwig.

AutoGluon – an AutoML toolkit for deep learning

From AWS Labs, with the goal of democratization of ML in mind, AutoGluon has been developed to enable "easy-to-use and easy-to-extend AutoML with a focus on deep learning and real-world applications spanning image, text, or tabular data". AutoGluon, an integral part of AWS's automated ML strategy, enables both junior and seasoned data scientists to build deep learning models and end-to-end solutions with ease. Like other automated ML toolkits, AutoGluon offers network architecture search, model selection, and custom model improvements.

The toolkit is available on GitHub to be downloaded: https://github.com/awslabs/autogluon.

Featuretools

Featuretools is an excellent Python framework that helps with automated feature engineering by using deep feature synthesis. Feature engineering is a tough problem due to its very nuanced nature. However, this open source toolkit, with its excellent timestamp handling and reusable feature primitives, provides an excellent framework you can use to build and extract a combination of features and look at what impact they have.

The toolkit is available on GitHub to be downloaded: https://github.com/FeatureLabs/featuretools/.

H2O AutoML

H2O's AutoML provides an open source version of H2O's commercial product, with APIs in R, Python, and Scala. This is an open source, distributed (multi-core and multi-node) implementation for automated ML algorithms and supports basic data preparation via a mix of grid and random search.

The toolkit is available on GitHub to be downloaded: github.com/h2oai/h2o-3.