Book Image

Automated Machine Learning with AutoKeras

By : Luis Sobrecueva
Book Image

Automated Machine Learning with AutoKeras

By: Luis Sobrecueva

Overview of this book

AutoKeras is an AutoML open-source software library that provides easy access to deep learning models. If you are looking to build deep learning model architectures and perform parameter tuning automatically using AutoKeras, then this book is for you. This book teaches you how to develop and use state-of-the-art AI algorithms in your projects. It begins with a high-level introduction to automated machine learning, explaining all the concepts required to get started with this machine learning approach. You will then learn how to use AutoKeras for image and text classification and regression. As you make progress, you'll discover how to use AutoKeras to perform sentiment analysis on documents. This book will also show you how to implement a custom model for topic classification with AutoKeras. Toward the end, you will explore advanced concepts of AutoKeras such as working with multi-modal data and multi-task, customizing the model with AutoModel, and visualizing experiment results using AutoKeras Extensions. By the end of this machine learning book, you will be able to confidently use AutoKeras to design your own custom machine learning models in your company.
Table of Contents (15 chapters)
1
Section 1: AutoML Fundamentals
5
Section 2: AutoKeras in Practice
11
Section 3: Advanced AutoKeras

Types of AutoML

This chapter will explore the frameworks available today for each of the previously listed AutoML types, giving you an idea of what is possible now in terms of AutoML. But first, let's briefly discuss the end-to-end ML pipeline and see where each process occurs in that pipeline.

As we saw in the previous workflow diagram, the ML pipeline involves more steps than the modeling ones, such as data steps and deployment steps. In this book, we will focus on the automation of modeling because it is one of the phases that require more investment of time and as we will see later, AutoKeras, the AutoML framework we will work on, uses neural architecture search and hyperparameter optimization methods, both applied in the modeling phase.

AutoML tries to automate each of the steps in the pipeline but the main time-consuming steps to automate usually are the following:

  • Automated feature engineering
  • Automated model selection and hyperparameter tuning
  • Automated neural network architecture selection

Automated feature engineering

The features used by the model have a direct impact on the performance of an ML algorithm. Feature engineering requires a large investment of time and human resources (data scientists) and involves a lot of trial and error, as well as deep domain knowledge.

Automated feature engineering is based on creating new sets of features iteratively until the ML model achieves good prediction performance.

In a standard feature engineering process, a dataset is collected, for example, a dataset from a job search website that collects data on the behavior of candidates. Usually, a data scientist will create new features if they are not already in the data, such as the following:

  • Search keywords
  • Titles of the job offers read by the candidates
  • Candidate application frequency
  • Time since the last application
  • Type of job offers to which the candidate applies

Feature engineering automation tries to create an algorithm that automatically generates or obtains these types of features from the data.

There is also a specialized form of ML called deep learning, in which features are extracted from images, text, and videos automatically using matrix transformations on the model layers.

Automated model choosing and hyperparameter optimization

After the data preprocessing phase, an ML algorithm has to be searched to train with these features so that it is able to predict from new observations. In contrast to the previous step, the selection of models is full of options to choose from. There are classification and regression models, neural network-based models, clustering models, and many more.

Each algorithm is suitable for a certain class of problems and with automated model selection, we can find the optimal model by executing all the appropriate models for a particular task and selecting the one that is most accurate. There is no ML algorithm that works well with all datasets and there are some algorithms that require more hyperparameter tuning than others. In fact, during model selection, we tend to experiment with different hyperparameters.

What are hyperparameters?

In the training phase of the model, there are many variables to be set. Basically, we can group them into two types: parameters and hyperparameters. Parameters are those that are learned in the model training process, such as weight and bias in a neural network, while hyperparameters are those that are initialized just before the training process as a learning rate, dropout factor, and so on.

Types of search methods

There are many algorithms to find the optimal hyperparameters of a model. The following figure highlights the best-known ones that are also used by AutoKeras:

Figure 1.5 – Hyperparameter search method paths

Figure 1.5 – Hyperparameter search method paths

Let's try to understand these methods in more detail:

  • Grid search: Given a set of variables (hyperparameters) and a set of values for each variable, grid search performs an exhaustive search, testing all possible combinations of these values in the variables to find the best possible model based on a defined evaluation metric, such as precision. In the case of a neural network with learning rate and dropout as hyperparameters to tune, we can define a learning rate set of values as [0.1, 0,01] and a dropout set of values as [0.2, 0,5], so grid search will train the model with these combinations:

    (a) learning_rate: 0.1, dropout=0.2 => Model version 1

    (b) learning_rate: 0.01, dropout=0.2 => Model version 2

    (c) learning_rate: 0.1, dropout=0.5 => Model version 3

    (d) learning_rate: 0.01, dropout=0.5 => Model version 4

  • Random search: This is similar to grid search but runs the training of the model combinations in a random order. That random exploration feature makes random search usually cheaper than grid search.
  • Bayesian search: This method performs a hyperparameter fit based on the Bayesian theorem that explores only combinations that maximize the probability function.
  • Hyperband: This is a novel variation of random search that tries to resolve the exploration/exploitation dilemma using a bandit-based approach to hyperparameter optimization.

Automated neural network architecture selection

The design of neural network architectures is one of the most complex and tedious tasks in the world of ML. Typically, in traditional ML, data scientists spend a lot of time iterating through different neural network architectures with different hyperparameters to optimize a model objective function. This is time-consuming, requires deep knowledge, and is prone to errors at times.

In the middle of the 2010s, the idea of implementing neural network search by employing evolutionary algorithms and reinforcement learning to design and find an optimal neural network architecture was introduced. It was called Network Architecture Search (NAS). Basically, it trains a model to create layers, stacking them to create a deep neural network architecture.

A NAS system involves these three main components:

  • Search space: Consists of a set of blocks of operations (full connected, convolution, and so on) and how these operations are connected to each other to form valid network architectures. Traditionally, the design of the search space is done by a data scientist.
  • Search algorithm: A NAS search algorithm tests a number of candidate network architecture models. From the metrics obtained, it selects the candidates with the highest performance.
  • Evaluation strategy: As a large number of models are required to be tested in order to obtain successful results, the process is computationally very expensive, so new methods appear every so often to save time or computing resources.

In the next figure, you can see the relationships between the three described components:

Figure 1.6 – NAS component relationships

Figure 1.6 – NAS component relationships

Currently, NAS is a new area of research that is attracting a lot of attention and several research papers have been published: http://www.ml4aad.org/automl/literature-on-neural-architecture-search/. Some of the most cited papers are as follows:

  • NASNet (https://arxiv.org/abs/1707.07012) – Learning Transferable Architecture for Scalable Image Recognition: High-precision models for image classification are based on very complex neural networks with lots of layers. NASNet is a method of learning model architectures directly from the dataset of interest. Due to the high cost of doing so when the dataset is very large, it first looks for an architectural building block in a small dataset, and then transfers the block to a larger dataset. This approach is a successful example of what you can achieve with AutoML, because NASNet-generated models often outperform state-of-the-art, human-designed models. In the following figure, we can see how NASNet works:
Figure 1.7 – Overview of NAS

Figure 1.7 – Overview of NAS

  • AmoebaNetRegularized Evolution for Image Classifier Architecture Search: This approach uses an evolutionary algorithm to efficiently discover high-quality architectures. To date, the evolutionary algorithms applied to image classification have not exceeded those created by humans. AmoebaNet-A surpasses them for the first time. The key has been to modify the selection algorithm by introducing an age property to favor the youngest genotypes. AmoebaNet-A has a similar precision to the latest generation ImageNet models discovered with more complex architecture search methods, showing that evolution can obtain results faster with the same hardware, especially in the early search stages, something that is especially important when there are few computational resources available. The following figure shows the correlation between precision and model size for some representative next-generation image classification models in history. The dotted circle shows 84.3% accuracy for an AmoebaNet model:
Figure 1.8 – Correlation between the top-1 accuracy and model size for state-of-the-art image classification models using the ImageNet dataset

Figure 1.8 – Correlation between the top-1 accuracy and model size for state-of-the-art image classification models using the ImageNet dataset

  • Efficient Neural Architecture Search (ENAS): This variant of NASNet improves its efficiency by allowing all child models to share their weights, so it is not necessary to train each child model from scratch. This optimization significantly improves classification performance.

There are many ML tools available, all of them with similar goals, to automate the different steps of the ML pipeline. The following are some of the most used tools:

  • AutoKeras: An AutoML system based on the deep learning framework Keras and using hyperparameter searching and NAS.
  • auto-sklearn: An AutoML toolkit that allows you to use a special type of scikit-learn estimator, which automates algorithm selection and hyperparameter tuning, using Bayesian optimization, meta-learning, and model ensembling.
  • DataRobot: An AI platform that automates the end-to-end process for building, deploying, and maintaining AI at scale.
  • Darwin: An AI tool that automates the slowest steps in the model life cycle, ensuring long-term quality and the scalability of models.
  • H2O-DriverlessAI: An AI platform for AutoML.
  • Google's AutoML: A suite of ML products that enable developers with no ML experience to train and use high-performance models in their projects. To do this, this tool uses Google's powerful next-generation transfer learning and neural architecture search technology.
  • Microsoft Azure AutoML: This cloud service creates many pipelines in parallel that try different algorithms and parameters for you.
  • Tree-based Pipeline Optimization Tool (TPOT): A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

We can see an exhaustive comparison of the main AutoML tools that currently exist in the paper Evaluation and Comparison of AutoML Approaches and Tools, and from it we can conclude that while the main commercial solutions, such as H2O-DriverlessAI, DataRobot, and Darwin, allow us to detect the data schema, execute the feature engineering, and analyze detailed results for interpretation purposes, open source tools are more focused on automating the modeling tasks, training, and model evaluation, leaving the data-oriented tasks to the data scientists.

The study also concludes that in the various evaluations and benchmarks tested, AutoKeras is the most stable and efficient tool, which is very important in a production environment where both performance and stability are key factors. These good features, in addition to being a widely used tool, are the main reason why AutoKeras was the AutoML framework chosen when writing this book.