Book Image

R Data Science Essentials

Book Image

R Data Science Essentials

Overview of this book

With organizations increasingly embedding data science across their enterprise and with management becoming more data-driven it is an urgent requirement for analysts and managers to understand the key concept of data science. The data science concepts discussed in this book will help you make key decisions and solve the complex problems you will inevitably face in this new world. R Data Science Essentials will introduce you to various important concepts in the field of data science using R. We start by reading data from multiple sources, then move on to processing the data, extracting hidden patterns, building predictive and forecasting models, building a recommendation engine, and communicating to the user through stunning visualizations and dashboards. By the end of this book, you will have an understanding of some very important techniques in data science, be able to implement them using R, understand and interpret the outcomes, and know how they helps businesses make a decision.
Table of Contents (15 chapters)
R Data Science Essentials
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Descriptive statistics


Descriptive statistics is a method of summarizing a dataset quantitatively. These summaries can be simple quantitative statements about the data or a visual representation sufficient enough to be part of the initial description about the dataset.

To get a basic understanding about the dataset, we can use the built-in function summary. This function quickly scans the dataset and provides the following information about the dataset. This will really help in getting a first-cut understanding about the data. This will be useful for numerical as well as categorical data.

summary(tdata)

The output is as follows:

The summary function provides us with a high-level detail about the variables in the dataset. In order to know more about the dataset such as the missing values, distribution of numerical variables, and distinct values of categorical variables, we need to use an additional package called Hmisc. (The implementation of this is given here.) The package can be installed...