Book Image

Machine Learning with R Quick Start Guide

By : Iván Pastor Sanz
Book Image

Machine Learning with R Quick Start Guide

By: Iván Pastor Sanz

Overview of this book

Machine Learning with R Quick Start Guide takes you on a data-driven journey that starts with the very basics of R and machine learning. It gradually builds upon core concepts so you can handle the varied complexities of data and understand each stage of the machine learning pipeline. From data collection to implementing Natural Language Processing (NLP), this book covers it all. You will implement key machine learning algorithms to understand how they are used to build smart models. You will cover tasks such as clustering, logistic regressions, random forests, support vector machines, and more. Furthermore, you will also look at more advanced aspects such as training neural networks and topic modeling. By the end of the book, you will be able to apply the concepts of machine learning, deal with data-related problems, and solve them using the powerful yet simple language that is R.
Table of Contents (9 chapters)

Gradient boosting

Gradient boosting means combining weak and average predictors to acquire one strong predictor. This ensures robustness. It is similar to a random forest, which is mainly based on decision trees. The difference is that the sample is not modified from one tree to another; only the weights of the different observations are modified.

Boosting trains trees sequentially by using information from previously trained trees. For this, we first need to create decision trees using the training dataset. Then, we need to create another model that does nothing but rectify the errors that occurred in the training model. This process is repeated sequentially until the specified number of trees, or some other stopping rule, is reached.

More specific details about the algorithm can be found in the documentation of the h2o package. While training the algorithm, we will need to define...