Book Image

Machine Learning Automation with TPOT

By : Dario Radečić
Book Image

Machine Learning Automation with TPOT

By: Dario Radečić

Overview of this book

The automation of machine learning tasks allows developers more time to focus on the usability and reactivity of the software powered by machine learning models. TPOT is a Python automated machine learning tool used for optimizing machine learning pipelines using genetic programming. Automating machine learning with TPOT enables individuals and companies to develop production-ready machine learning models cheaper and faster than with traditional methods. With this practical guide to AutoML, developers working with Python on machine learning tasks will be able to put their knowledge to work and become productive quickly. You'll adopt a hands-on approach to learning the implementation of AutoML and associated methodologies. Complete with step-by-step explanations of essential concepts, practical examples, and self-assessment questions, this book will show you how to build automated classification and regression models and compare their performance to custom-built models. As you advance, you'll also develop state-of-the-art models using only a couple of lines of code and see how those models outperform all of your previous models on the same datasets. By the end of this book, you'll have gained the confidence to implement AutoML techniques in your organization on a production level.
Table of Contents (14 chapters)
1
Section 1: Introducing Machine Learning and the Idea of Automation
3
Section 2: TPOT – Practical Classification and Regression
8
Section 3: Advanced Examples and Neural Networks in TPOT

Applying automated regression modeling to the fish market dataset

This section demonstrates how to apply machine learning automation with TPOT to a regression dataset. The section uses the fish market dataset (https://www.kaggle.com/aungpyaeap/fish-market) for exploration and regression modeling. The goal is to predict the weight of a fish. You will learn how to load the dataset, visualize it, adequately prepare it, and how to find the best machine learning pipeline with TPOT:

  1. The first thing to do is to load in the required libraries and load in the dataset. With regards to the libraries, you'll need numpy, pandas, matplotlib, and seaborn. Additionally, the rcParams module is imported with matplotlib to tweak the plot stylings a bit. You can find the code for this step in the following block:
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    from matplotlib import rcParams
    rcParams['axes.spines.top'] = False
    rcParams...