Book Image

Mastering Machine Learning with R, Second Edition - Second Edition

Book Image

Mastering Machine Learning with R, Second Edition - Second Edition

Overview of this book

This book will teach you advanced techniques in machine learning with the latest code in R 3.3.2. You will delve into statistical learning theory and supervised learning; design efficient algorithms; learn about creating Recommendation Engines; use multi-class classification and deep learning; and more. You will explore, in depth, topics such as data mining, classification, clustering, regression, predictive modeling, anomaly detection, boosted trees with XGBOOST, and more. More than just knowing the outcome, you’ll understand how these concepts work and what they do. With a slow learning curve on topics such as neural networks, you will explore deep learning, and more. By the end of this book, you will be able to perform machine learning with R in the cloud using AWS in various scenarios with different datasets.
Table of Contents (23 chapters)
Title Page
Credits
About the Author
About the Reviewers
Packt Upsell
Customer Feedback
Preface
16
Sources

Modeling and evaluation


With the data prepared, we will begin the modeling process. For comparison purposes, we will create a model using best subsets regression like the previous two chapters and then utilize the regularization techniques.

Best subsets

The following code is, for the most part, a rehash of what we developed in Chapter 2, Linear Regression - The Blocking and Tackling of Machine Learning. We will create the best subset object using the regsubsets() command and specify the train portion of data. The variables that are selected will then be used in a model on the test set, which we will evaluate with a mean squared error calculation.

The model that we are building is written out as lpsa ~ . with the tilde and period stating that we want to use all the remaining variables in our data frame, with the exception of the response:

> subfit <- regsubsets(lpsa ~ ., data = train)

With the model built, you can produce the best subset with two lines of code. The first one turns the summary...