Book Image

Machine Learning with R Quick Start Guide

By : Iván Pastor Sanz
Book Image

Machine Learning with R Quick Start Guide

By: Iván Pastor Sanz

Overview of this book

Machine Learning with R Quick Start Guide takes you on a data-driven journey that starts with the very basics of R and machine learning. It gradually builds upon core concepts so you can handle the varied complexities of data and understand each stage of the machine learning pipeline. From data collection to implementing Natural Language Processing (NLP), this book covers it all. You will implement key machine learning algorithms to understand how they are used to build smart models. You will cover tasks such as clustering, logistic regressions, random forests, support vector machines, and more. Furthermore, you will also look at more advanced aspects such as training neural networks and topic modeling. By the end of the book, you will be able to apply the concepts of machine learning, deal with data-related problems, and solve them using the powerful yet simple language that is R.
Table of Contents (9 chapters)

Implementing decision trees

When we looked at random forests in the Testing a random forest model section of chapter 5 (Predicting the Failures of Banks - Multivariate Analysis) previously, decision trees were briefly introduced. In a decision tree, the training sample is split into two or more homogeneous sets based on the most significant independent variables. In a decision tree, the best variable to split the data into the different categories is found. Information gain and the Gini index are the most common ways to find this variable. Then, data is recursively split, expanding the leaf nodes of the tree until the stopping criterion is reached.

Let's see how a decision tree can be implemented in R and how this algorithm is able to predict credit ratings.

Decision trees are implemented in the rpart package. Moreover, the rpart.plot package will be useful to visualize...