Book Image

Regression Analysis with Python

By : Luca Massaron, Alberto Boschetti
4 (1)
Book Image

Regression Analysis with Python

4 (1)
By: Luca Massaron, Alberto Boschetti

Overview of this book

Regression is the process of learning relationships between inputs and continuous outputs from example data, which enables predictions for novel inputs. There are many kinds of regression algorithms, and the aim of this book is to explain which is the right one to use for each set of problems and how to prepare real-world data for it. With this book you will learn to define a simple regression problem and evaluate its performance. The book will help you understand how to properly parse a dataset, clean it, and create an output matrix optimally built for regression. You will begin with a simple regression algorithm to solve some data science problems and then progress to more complex algorithms. The book will enable you to use regression models to predict outcomes and take critical business decisions. Through the book, you will gain knowledge to use Python for building fast better linear models and to apply the results in Python or in any computer language you prefer.
Table of Contents (16 chapters)
Regression Analysis with Python
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

A time series problem


The last problem we're going to see in this chapter is about prediction in time. The standard name for these problems is time series analysis, since the prediction is made on descriptors extracted in the past; therefore, the outcome at the current time will become a feature for the prediction of the next point in time. In this exercise, we're using the closing values for several stocks composing the Dow Jones index in 2011.

Several features compose the dataset, but in this problem (to make a short and complete exercise) we're just using the closing values of each week for each of the 30 measured stocks, ordered in time. The dataset spans six months: we're using the first half of the dataset (corresponding to the first quarter of the year under observation, with 12 weeks) to train our algorithm, and the second half (containing the second quarter of the year, with 13 weeks) to test the predictions.

Moreover, since we don't expect readers to have a background in economics...