Book Image

scikit-learn Cookbook - Second Edition

By : Trent Hauck
Book Image

scikit-learn Cookbook - Second Edition

By: Trent Hauck

Overview of this book

Python is quickly becoming the go-to language for analysts and data scientists due to its simplicity and flexibility, and within the Python data space, scikit-learn is the unequivocal choice for machine learning. This book includes walk throughs and solutions to the common as well as the not-so-common problems in machine learning, and how scikit-learn can be leveraged to perform various machine learning tasks effectively. The second edition begins with taking you through recipes on evaluating the statistical properties of data and generates synthetic data for machine learning modelling. As you progress through the chapters, you will comes across recipes that will teach you to implement techniques like data pre-processing, linear regression, logistic regression, K-NN, Naïve Bayes, classification, decision trees, Ensembles and much more. Furthermore, you’ll learn to optimize your models with multi-class classification, cross validation, model evaluation and dive deeper in to implementing deep learning with scikit-learn. Along with covering the enhanced features on model section, API and new features like classifiers, regressors and estimators the book also contains recipes on evaluating and fine-tuning the performance of your model. By the end of this book, you will have explored plethora of features offered by scikit-learn for Python to solve any machine learning problem you come across.
Table of Contents (13 chapters)

Stacking with a neural network

The two most common meta-learning methods are bagging and boosting. Stacking is less widely used; yet it is powerful because one can combine models of different types. All three methods create a stronger estimator from a set of not-so-strong estimators. We tried the stacking procedure in Chapter 9, Tree Algorithms and Ensembles. Here, we try it with a neural network mixed with other models.

The process for stacking is as follows:

  1. Split the dataset into training and testing sets.
  2. Split the training set into two sets.
  1. Train base learners on the first part of the training set.
  2. Make predictions using the base learners on the second part of the training set. Store these prediction vectors.
  3. Take the stored prediction vectors as inputs and the target variable as output. Train a higher level learner (note that we are still on the second part of the training...