Book Image

Machine Learning with R Quick Start Guide

By : Iván Pastor Sanz
Book Image

Machine Learning with R Quick Start Guide

By: Iván Pastor Sanz

Overview of this book

Machine Learning with R Quick Start Guide takes you on a data-driven journey that starts with the very basics of R and machine learning. It gradually builds upon core concepts so you can handle the varied complexities of data and understand each stage of the machine learning pipeline. From data collection to implementing Natural Language Processing (NLP), this book covers it all. You will implement key machine learning algorithms to understand how they are used to build smart models. You will cover tasks such as clustering, logistic regressions, random forests, support vector machines, and more. Furthermore, you will also look at more advanced aspects such as training neural networks and topic modeling. By the end of the book, you will be able to apply the concepts of machine learning, deal with data-related problems, and solve them using the powerful yet simple language that is R.
Table of Contents (9 chapters)

Embedded methods

The main difference between filter and wrapper approaches is that in filter approaches, such as embedded methods, you cannot separate the learning and feature selection parts.

Regularization methods are the most common type of embedded feature selection methods.

In classification problems such as this one, the logistic regression method cannot handle the multi-collinearity problem, which occurs when variables are very correlated. When the number of observations is not much larger than the number of variables of covariates, p, then there can be a lot of variability. Consequently, this variability could even increase the likelihood by simply adding more parameters, resulting in overfitting.

If variables are highly correlated or if collinearity exists, we expect the model parameters and variance to be inflated. The high variance is because of the wrongly specified...