Univariate analysis is the simplest form of analysis, where we consider only one variable at a time and understand the data. Some of the measures have already been covered in descriptive statistics such as the mean and median of the data.
We will perform one more univariate analysis: the distribution of the data. We will consider the age of the people who had travelled in the Titanic, and we will find out how many people were there in the different age groups:
age <- na.omit(tdata$Age)
First, we read the data to the age
data frame by excluding the cases where the age was not present. As we want to get the distribution on a fixed range, we first get the age of the youngest as well as the oldest person who travelled on the ship from the available dataset using the seq
function. We set the starting value as 0 and the last value as 80; we also set the interval as 10:
range(age) breaks = seq(0, 80, by=10)
We created the intervals and stored them in the variable breaks. Using...