Chapter 2: Grammar of Graphics and Visual Components
The following are the activity solutions for this chapter.
Activity: Applying Grammar of Graphics to Create a Complex Visualization
Steps for Completion:
- Use the commands that we just explored to create the scatterplot.
- For this activity, you will use the
gapminder
dataset. - You can use the
help
command to explore the options. - To change scales, you will have to use one of the preceding label formats.
- Use
labels=scales::unit_format ("K", 1e-3))
for labeling.
Outcome:
The output code is as follows:
ggplot(df, aes(x=gdp_per_capita,y=Electricity_consumption_per_capita))+ geom_point()+ scale_x_continuous(name="GDP",breaks = seq(0,50000,5000), labels=scales::unit_format("K", 1e-3)) + scale_y_continuous(name="Electricity Consumption", breaks = seq(0,20000,2000), labels=scales::unit_format("K", 1e-3))
Activity: Using Faceting to Understand Data
Steps for Completion:
- Use the loan data and plot a histogram (use
fill color=cadetblue4
andbins=10
). - Use
facet_wrap()
to plot the loan data for the different credit grades. - Now, you will need to change the default options for
facet_wrap
, in order to produce the following plots. Use?facet_wrap
on the command line to view the options that can be changed.
Outcome:
Refer to the complete code at the following path: https://goo.gl/RheL2G. The answers to the questions are given here:
scale=free_y
.- A, B, and C have maximum loan amounts below 10,000. (A, B, C, and D is also an acceptable answer.)
- F and G show uniform distributions.
- No, none of the distributions are normally distributed.
Activity: Using Color Differentiation in Plots
Steps for Completion:
- Use the
LoanStats
dataset and make a subset using the following variables:
dfn <- df3[,c("home_ownership","loan_amnt","grade")]
- Clean the dataset (removing the NONE and NA cases), using the following code:
dfn <- na.omit(dfn) dfn <- subset (dfn, !dfn$home_ownership %in% c("NONE"))
- Create a boxplot showing the loan amount versus home ownership.
- Color differentiate by credit grade.
Outcome:
Refer to the following URL for the output: https://goo.gl/RheL2G.
The answers to question 5 are as follows:
- Credit grades F and G are the highest. Credit grades A and B are the lowest.
- They are higher for a person who has a mortgage.
- The median value for A is 2,000, and the median value for G is 20,000, so the difference is 180,000.
Activity: Using Themes and Color Differentiation in a Plot
Steps for Completion:
- Make a scatterplot of female versus male BMIs.
- Build your plot in layers, to avoid creating three separate plots.
- Create the default plot. Store this plot as p1.
- Points should be differentiated by color. Differentiate the two BMIs by country using color. The size of the points should be 2.
- Change the color scheme by using
scale_color_brewer
. The palette used is Dark2. Store this plot as p2. - Add a plot title: BMI female vs BMI Male.
- Change more of the theme's aspects to produce plot p3. The theme aspects to be changed, and their values, are as follows:
- Panel Background:
azure
; Color:black
- No grid lines
- Axis Title Size: 15; Axis Title Color:
cadetblue4
- Change x and y titles: BMI female and BMI Male
- Legend: Position bottom, Lef justifid, No Legend Title, legend key (fil –
gray97
, color of the line=3) - Plot Title Color:
cadetblue4
; Size: 18; Face:bold.italic
- Panel Background:
Outcome:
The output code is as follows:
pd1 <- ggplot(df,aes(x=BMI_male,y=BMI_female)) pd2 <- pd1+geom_point() pd3 <- pd1+geom_point(aes(color=Country),size=2)+ scale_colour_brewer(palette="Dark2") pd4 <- pd3+theme(axis.title=element_text(size=15,color="cadetblue4", face="bold"), plot.title=element_text(color="cadetblue4", size=18, face="bold.italic"), panel.background = element_rect(fill="azure",color="black"), panel.grid=element_blank(), legend.position="bottom", legend.justification="left", legend.title = element_blank(), legend.key = element_rect(color=3,fill="gray97") )+ xlab("BMI Male")+ ylab("BMI female")+ ggtitle("BMI female vs BMI Male")