Book Image

Regression Analysis with R

By : Giuseppe Ciaburro
Book Image

Regression Analysis with R

By: Giuseppe Ciaburro

Overview of this book

Regression analysis is a statistical process which enables prediction of relationships between variables. The predictions are based on the casual effect of one variable upon another. Regression techniques for modeling and analyzing are employed on large set of data in order to reveal hidden relationship among the variables. This book will give you a rundown explaining what regression analysis is, explaining you the process from scratch. The first few chapters give an understanding of what the different types of learning are – supervised and unsupervised, how these learnings differ from each other. We then move to covering the supervised learning in details covering the various aspects of regression analysis. The outline of chapters are arranged in a way that gives a feel of all the steps covered in a data science process – loading the training dataset, handling missing values, EDA on the dataset, transformations and feature engineering, model building, assessing the model fitting and performance, and finally making predictions on unseen datasets. Each chapter starts with explaining the theoretical concepts and once the reader gets comfortable with the theory, we move to the practical examples to support the understanding. The practical examples are illustrated using R code including the different packages in R such as R Stats, Caret and so on. Each chapter is a mix of theory and practical examples. By the end of this book you will know all the concepts and pain-points related to regression analysis, and you will be able to implement your learning in your projects.
Table of Contents (15 chapters)
Title Page
Packt Upsell
Contributors
Preface
Index

Chapter 1. Getting Started with Regression

Regression analysis is the starting point in data science. This is because regression models represent the most well-understood models in numerical simulation. Once we experience the workings of regression models, we will be able to understand all other machine learning algorithms. Regression models are easily interpretable as they are based on solid mathematical bases (such as matrix algebra for example). We will see in the following sections that linear regression allows us to derive a mathematical formula representative of the corresponding model. Perhaps this is why such techniques are extremely easy to understand.

Regression analysis is a statistical process done to study the relationship between a set of independent variables (explanatory variables) and the dependent variable (response variable). Through this technique, it will be possible to understand how the value of the response variable changes when the explanatory variable is varied.

Consider some data that is collected about a group of students, on: number of study hours per day, attendance at school, and scores on the final exam obtained. Through regression techniques, we can quantify the average increase in the final exam score when we add one more hour of study. Lower attendance in school (decreasing the student's experience) lowers the scores in the final exam.

A regression analysis can have two objectives:

  • Explanatory analysis: To understand and weigh the effects of the independent variable on the dependent variable according to a particular theoretical model
  • Predictive analysis: To locate a linear combination of the independent variable to predict the value assumed by the dependent variable optimally

In this chapter, we will be introduced to the basic concepts of regression analysis, and then we'll take a tour of the different types of statistical processes. In addition to this, we will also introduce the R language and cover the basics of the R programming environment. Finally we will explore the essential tools that R provides for understanding the amazing world of regression.

We will cover the following topics:

  • The origin of regression
  • Types of algorithms
  • How to quickly set up R for data science
  • R packages used throughout the book

At the end of this chapter, we will provide you with a working environment that is able to run all the examples contained in the following chapters. You will also get a clear idea about why regression analysis is not just an underrated technique taken from statistics, but a powerful and effective data science algorithm.