Sometimes, you will need to calculate statistical summaries, such as the mean, median, or a quartile of a variable, and view changes with respect to another variable. This can be done by using grouping commands.
Let's plot Genre
versus AudienceScore
for the HollywoodMovies
dataset. Change the angle of the axis labeling text, in order to make it less cluttered, using the following command:
ggplot(HollywoodMovies,aes(Genre,AudienceScore))+geom_point()+theme(axis.text.x=element_text(angle=40))
You'll get the following output:
In this section, we'll use grouping to summarize multiple y values for a given x value. Let's begin by implementing the following steps:
- Use grouping to group by genre and remove NULL values:
gp_scr <- group_by(HollywoodMovies,Genre) gp_scr <- na.omit(gp_scr)
- Calculate the mean and standard deviation using the summarise function and make a new dataset, as follows:
dfnew <- dplyr::summarise(gp_scr...