Book Image

Regression Analysis with Python

By : Luca Massaron, Alberto Boschetti
4 (1)
Book Image

Regression Analysis with Python

4 (1)
By: Luca Massaron, Alberto Boschetti

Overview of this book

Regression is the process of learning relationships between inputs and continuous outputs from example data, which enables predictions for novel inputs. There are many kinds of regression algorithms, and the aim of this book is to explain which is the right one to use for each set of problems and how to prepare real-world data for it. With this book you will learn to define a simple regression problem and evaluate its performance. The book will help you understand how to properly parse a dataset, clean it, and create an output matrix optimally built for regression. You will begin with a simple regression algorithm to solve some data science problems and then progress to more complex algorithms. The book will enable you to use regression models to predict outcomes and take critical business decisions. Through the book, you will gain knowledge to use Python for building fast better linear models and to apply the results in Python or in any computer language you prefer.
Table of Contents (16 chapters)
Regression Analysis with Python
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Bagging and boosting


Bagging and boosting are two techniques used to combine learners. These techniques are classified under the generic name of ensembles (or meta-algorithm) because the ultimate goal is actually to ensemble weak learners to create a more sophisticated, but more accurate, model. There is no formal definition of a weak learner, but ideally it's a fast, sometimes linear model that not necessarily produces excellent results (it suffices that they are just better than a random guess). The final ensemble is typically a non-linear learner whose performance increases with the number of weak learners in the model (note that the relation is strictly non-linear). Let's now see how they work.

Bagging

Bagging stands for Bootstrap Aggregating, and its ultimate goal is to reduce variance by averaging weak learners' results. Let's now see the code; we will explain how it works. As a dataset, we will reuse the Boston dataset (and its validation split) from the previous example:

In:
from sklearn...