Book Image

Applied Supervised Learning with R

By : Karthik Ramasubramanian, Jojo Moolayil
Book Image

Applied Supervised Learning with R

By: Karthik Ramasubramanian, Jojo Moolayil

Overview of this book

R provides excellent visualization features that are essential for exploring data before using it in automated learning. Applied Supervised Learning with R helps you cover the complete process of employing R to develop applications using supervised machine learning algorithms for your business needs. The book starts by helping you develop your analytical thinking to create a problem statement using business inputs and domain research. You will then learn different evaluation metrics that compare various algorithms, and later progress to using these metrics to select the best algorithm for your problem. After finalizing the algorithm you want to use, you will study the hyperparameter optimization technique to fine-tune your set of optimal parameters. The book demonstrates how you can add different regularization terms to avoid overfitting your model. By the end of this book, you will have gained the advanced skills you need for modeling a supervised machine learning algorithm that precisely fulfills your business needs.
Table of Contents (12 chapters)
Applied Supervised Learning with R
Preface

Linear Regression


Let's revisit the multiple linear regression from Chapter 3, Introduction to Supervised Learning. The following equation is the mathematical representation of a linear equation, or linear predictor function, with p explanatory variables and n observations:

Where each is a vector of column values (explanatory variable) and is the unknown parameters or coefficients. , makes this equation suitable for simple linear regression. There are many algorithms to fit this function onto the data. The most popular one is Ordinary Least Square (OLS).

Before understanding the details of OLS, first let's interpret the equation we got while trying to fit the Beijing PM2.5 data from the model building section of simple and multiple linear regression from Chapter 3, Introduction to Supervised Learning.

If we substitute the values of regression coefficients, and from the output of the lm() function, we get:

The preceding equation attempts to answer the question "Are the factors DEWP, TEMP...