Book Image

Hands-On Automated Machine Learning

By : Sibanjan Das, Umit Mert Cakmak
Book Image

Hands-On Automated Machine Learning

By: Sibanjan Das, Umit Mert Cakmak

Overview of this book

AutoML is designed to automate parts of Machine Learning. Readily available AutoML tools are making data science practitioners’ work easy and are received well in the advanced analytics community. Automated Machine Learning covers the necessary foundation needed to create automated machine learning modules and helps you get up to speed with them in the most practical way possible. In this book, you’ll learn how to automate different tasks in the machine learning pipeline such as data preprocessing, feature selection, model training, model optimization, and much more. In addition to this, it demonstrates how you can use the available automation libraries, such as auto-sklearn and MLBox, and create and extend your own custom AutoML components for Machine Learning. By the end of this book, you will have a clearer understanding of the different aspects of automated Machine Learning, and you’ll be able to incorporate automation tasks using practical datasets. You can leverage your learning from this book to implement Machine Learning in your projects and get a step closer to winning various machine learning competitions.
Table of Contents (10 chapters)

Why use AutoML and how does it help?

There are a lot of ML tutorials on the internet, and usually the sample datasets are clean, formatted, and ready to be used with algorithms because the aim of many tutorials is to show the capability of certain tools, libraries, or Software as a Service (SaaS) offerings.

In reality, datasets come in different types and sizes. A recent industry survey done by Kaggle in 2017, titled The State of Data Science and Machine Learning, with over 16,000 responses, shows that the top-three commonly-used datatypes are relational data, text data, and image data.

Moreover, messy data is at the top of the list of problems that people have to deal with, again based on the Kaggle survey. When a dataset is messy and needs a lot of special treatment to be used by ML models, you spend a considerable amount of time on data cleaning, manipulation, and hand-crafting features to get it in the right shape. This is the most time-consuming part of any data science project.

What about selection of performant ML models, hyperparameter optimization of models in training, validation, and testing phases? These are also crucially-important steps that can be executed in many ways.

When the combination of right components come together to process and model your dataset, that combination represents a ML pipeline, and the number of pipelines to be experimented could grow very quickly.

For building high-performing ML pipelines successfully, you should systematically go through all the available options for every step by considering your limits in terms of time and hardware/software resources.

AutoML systems help you to define robust approaches to automatically constructing ML pipelines for a given problem and effectively execute them in order to find performant models.