Book Image

R Data Science Essentials

Book Image

R Data Science Essentials

Overview of this book

With organizations increasingly embedding data science across their enterprise and with management becoming more data-driven it is an urgent requirement for analysts and managers to understand the key concept of data science. The data science concepts discussed in this book will help you make key decisions and solve the complex problems you will inevitably face in this new world. R Data Science Essentials will introduce you to various important concepts in the field of data science using R. We start by reading data from multiple sources, then move on to processing the data, extracting hidden patterns, building predictive and forecasting models, building a recommendation engine, and communicating to the user through stunning visualizations and dashboards. By the end of this book, you will have an understanding of some very important techniques in data science, be able to implement them using R, understand and interpret the outcomes, and know how they helps businesses make a decision.
Table of Contents (15 chapters)
R Data Science Essentials
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Ensemble models


We will build multiple algorithms and combine the results of different algorithms with different weightages. We can decide on the weightage based on a trial-and-error basis. As discussed previously, the dataset can be divided into a training set and testing set, and we can evaluate the performance with different weightages.

The other popular regression models that can be implemented are Support Vector Machine (SVM) and Random Forest. The SVM algorithm is where the model is built by constructing a hyperplane, and in random forest, we build the model by building a number of decision trees.

Replacing NA with mean or median

When there are very few records with blank or NA values, then we can also consider replacing them with the average value. This methodology may or may not help in improving the accuracy, and so, it is important to test the performance on a sample test data. This can be implemented using the following code.

It is preferred to replace the missing values with the...