Regression Analysis with R

Regression Analysis with R

By : Giuseppe Ciaburro

Buy this Book

Regression Analysis with R

By: Giuseppe Ciaburro

Buy this Book

Overview of this book

Regression analysis is a statistical process which enables prediction of relationships between variables. The predictions are based on the casual effect of one variable upon another. Regression techniques for modeling and analyzing are employed on large set of data in order to reveal hidden relationship among the variables. This book will give you a rundown explaining what regression analysis is, explaining you the process from scratch. The first few chapters give an understanding of what the different types of learning are – supervised and unsupervised, how these learnings differ from each other. We then move to covering the supervised learning in details covering the various aspects of regression analysis. The outline of chapters are arranged in a way that gives a feel of all the steps covered in a data science process – loading the training dataset, handling missing values, EDA on the dataset, transformations and feature engineering, model building, assessing the model fitting and performance, and finally making predictions on unseen datasets. Each chapter starts with explaining the theoretical concepts and once the reader gets comfortable with the theory, we move to the practical examples to support the understanding. The practical examples are illustrated using R code including the different packages in R such as R Stats, Caret and so on. Each chapter is a mix of theory and practical examples. By the end of this book you will know all the concepts and pain-points related to regression analysis, and you will be able to implement your learning in your projects.

Title Page

Packt Upsell

Contributors

Preface

Free Chapter

Getting Started with Regression

Going back to the origin of regression

Regression in the real world

Understanding regression concepts

Regression versus correlation

Discovering different types of regression

The R environment

Installing R

RStudio

R packages for regression

Summary

Basic Concepts – Simple Linear Regression

Association between variables – covariance and correlation

Searching linear relationships

Least squares regression

Creating a linear regression model

Modeling a perfect linear association

Summary

More Than Just One Predictor – MLR

Multiple linear regression concepts

Building a multiple linear regression model

Multiple linear regression with categorical predictor

Gradient Descent and linear regression

Polynomial regression

Summary

When the Response Falls into Two Categories – Logistic Regression

Understanding logistic regression

Generalized Linear Model

Multiple logistic regression

Multinomial logistic regression

Summary

Data Preparation Using R Tools

Data wrangling

Finding outliers in data

Scale of features

Discretization in R

Dimensionality reduction

Summary

Avoiding Overfitting Problems - Achieving Generalization

Understanding overfitting

Feature selection

Regularization

Summary

Going Further with Regression Models

Robust linear regression

Bayesian linear regression

Count data model

Summary

Beyond Linearity – When Curving Is Much Better

Nonlinear least squares

Multivariate Adaptive Regression Splines

Generalized Additive Model

Regression trees

Support Vector Regression

Summary

Regression Analysis in Practice

Random forest regression with the Boston dataset

Classifying breast cancer using logistic regression

Regression with neural networks

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Discovering different types of regression

As mentioned before, regression analysis is a statistical process for studying the relationship between a set of independent variables (explanatory variables) and the dependent variable (response variable). Through this technique, it will be possible to understand how the value of the response variable changes when the explanatory variable is varied.

The power of regression techniques is due to the quality of their algorithms, which have been improved and updated over the years. These are divided into several main types, depending on the nature of the dependent and independent variables used or the shape of the regression line.

The reason for such a wide range of regression techniques is the variety of cases to be analyzed. Each case is based on data with specific characteristics, and each analysis is characterized by specific objectives. These specifications require the use of different types of regression techniques to obtain the best results.

How do we distinguish between different types of regression techniques? Previously, we said that a first distinction can be made based on the form of the regression line. Based on this feature, the regression analysis is divided into linear regression and nonlinear regression, as shown in the following figure (linear regression to the left and nonlinear quadratic regression to the right):

It's clear that the shape of the regression line is dependent on the distribution of data. There are cases where a straight line is the regression line that best approximates the data, while in other cases, you need to fall into a curve to get the best approximation. That said, it is easy to understand that a visual analysis of the distribution of data we are going to analyze is a good practice to be done in advance. By summarizing the shape of distribution, we can distinguish the type of regression between the following:

Linear regression
Nonlinear regression

Let us now analyze the nature of the variables involved. In this regard, a question arises spontaneously: can the number of explanatory variables affect the choice of regression technique? The answer to this question is surely positive. For example, in the case of linear regression, if there is only one input variable, then we will do simple linear regression. If, instead, the input variables are two or more, we will need to perform multiple linear regression.

By summarizing, a simple linear regression shows the relationship between a dependent variable Y and an independent variable X. A multiple regression model shows the relationship between a dependent variable Y and multiple independent variables X. In the following figure, the types of regression imposed from the Number of the explanatory variables are shown:

What if we have multiple response variables rather than explanatory variables? In that case, we move from univariate models to multivariate models. As suggested by the name itself, multivariate regression is a technique with the help of which a single regression model can be estimated with more than one response variable. When there is more than one explanatory variable in a multivariate regression model, the model is a multivariate multiple regression.

Finally, let's see what happens when we analyze the type of variables. Usually, regression analysis is used when you want to predict a continuous response variable from a number of explanatory variables, also continuous. But this is not a limitation of regression, in the sense that such analysis is also applicable when categorical variables are at stake.

In the case of a dichotomous explanatory variable (which takes a value of zero or one), the solution is immediate. There are already two numbers (zero and one) associated to this variable, so the regression is immediately applicable. Categorical explanatory variables with more than two values can also be used in regression analyses; however, before they can be used, they need to be converted into variables that have only two levels (such as zero and one). This is called dummy coding or indicator variables.

Logistic regression should be used if the response variable is dichotomous.

Regression Analysis with R

By : Giuseppe Ciaburro

Regression Analysis with R

By: Giuseppe Ciaburro

Overview of this book

Related Content you might be interested in

Current Title:

Regression Analysis with R

Neural Networks with R

Hands-On Exploratory Data Analysis with R

Hands-On Machine Learning on Google Cloud Platform

Discovering different types of regression