Descriptive statistics is a method of summarizing a dataset quantitatively. These summaries can be simple quantitative statements about the data or a visual representation sufficient enough to be part of the initial description about the dataset.
To get a basic understanding about the dataset, we can use the built-in function summary
. This function quickly scans the dataset and provides the following information about the dataset. This will really help in getting a first-cut understanding about the data. This will be useful for numerical as well as categorical data.
summary(tdata)
The output is as follows:
The summary
function provides us with a high-level detail about the variables in the dataset. In order to know more about the dataset such as the missing values, distribution of numerical variables, and distinct values of categorical variables, we need to use an additional package called Hmisc
. (The implementation of this is given here.) The package can be installed...