Book Image

Regression Analysis with Python

By : Luca Massaron, Alberto Boschetti
4 (1)
Book Image

Regression Analysis with Python

4 (1)
By: Luca Massaron, Alberto Boschetti

Overview of this book

Regression is the process of learning relationships between inputs and continuous outputs from example data, which enables predictions for novel inputs. There are many kinds of regression algorithms, and the aim of this book is to explain which is the right one to use for each set of problems and how to prepare real-world data for it. With this book you will learn to define a simple regression problem and evaluate its performance. The book will help you understand how to properly parse a dataset, clean it, and create an output matrix optimally built for regression. You will begin with a simple regression algorithm to solve some data science problems and then progress to more complex algorithms. The book will enable you to use regression models to predict outcomes and take critical business decisions. Through the book, you will gain knowledge to use Python for building fast better linear models and to apply the results in Python or in any computer language you prefer.
Table of Contents (16 chapters)
Regression Analysis with Python
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Starting from the basics


We will start exploring the first dataset, the Boston dataset, but before delving into numbers, we will upload a series of helpful packages that will be used during the rest of the chapter:

In: import numpy as np
  import pandas as pd
  import matplotlib.pyplot as plt
  import matplotlib as mpl

If you are working from an IPython Notebook, running the following command in a cell will instruct the Notebook to represent any graphic output in the Notebook itself (otherwise, if you are not working on IPython, just ignore the command because it won't work in IDEs such as Python's IDLE or Spyder):

In: %matplotlib inline
  # If you are using IPython, this will make the images available in the Notebook

To immediately select the variables that we need, we just frame all the data available into a Pandas data structure, DataFrame.

Inspired by a similar data structure present in the R statistical language, a DataFrame renders data vectors of different types easy to handle under...