Book Image

Applied Data Visualization with R and ggplot2

By : Dr. Tania Moulik
Book Image

Applied Data Visualization with R and ggplot2

By: Dr. Tania Moulik

Overview of this book

Applied Data Visualization with R and ggplot2 introduces you to the world of data visualization by taking you through the basic features of ggplot2. To start with, you’ll learn how to set up the R environment, followed by getting insights into the grammar of graphics and geometric objects before you explore the plotting techniques. You’ll discover what layers, scales, coordinates, and themes are, and study how you can use them to transform your data into aesthetical graphs. Once you’ve grasped the basics, you’ll move on to studying simple plots such as histograms and advanced plots such as superimposing and density plots. You’ll also get to grips with plotting trends, correlations, and statistical summaries. By the end of this book, you’ll have created data visualizations that will impress your clients.
Table of Contents (10 chapters)

Chapter 2:  Grammar of Graphics and Visual Components


The following are the activity solutions for this chapter.

Activity: Applying Grammar of Graphics to Create a Complex Visualization

Steps for Completion:

  1. Use the commands that we just explored to create the scatterplot.
  2. For this activity, you will use the gapminder dataset.
  3. You can use the help command to explore the options.
  4. To change scales, you will have to use one of the preceding label formats.
  5. Use labels=scales::unit_format ("K", 1e-3)) for labeling.

Outcome:

The output code is as follows:

ggplot(df, aes(x=gdp_per_capita,y=Electricity_consumption_per_capita))+
    geom_point()+
    scale_x_continuous(name="GDP",breaks = seq(0,50000,5000),
                       labels=scales::unit_format("K", 1e-3)) +
    scale_y_continuous(name="Electricity Consumption",
                       breaks = seq(0,20000,2000),
                       labels=scales::unit_format("K", 1e-3))

Activity: Using Faceting to Understand Data

Steps for Completion:

  1. Use the loan data and plot a histogram (use fill color=cadetblue4 and bins=10).
  2. Use facet_wrap() to plot the loan data for the different credit grades.
  3. Now, you will need to change the default options for facet_wrap, in order to produce the following plots. Use ?facet_wrap on the command line to view the options that can be changed.

Outcome:

Refer to the complete code at the following path: https://goo.gl/RheL2G. The answers to the questions are given here:

  1. scale=free_y.
  2. A, B, and C have maximum loan amounts below 10,000. (A, B, C, and D is also an acceptable answer.)
  3. F and G show uniform distributions.
  4. No, none of the distributions are normally distributed.

Activity: Using Color Differentiation in Plots

Steps for Completion:

  1. Use the LoanStats dataset and make a subset using the following variables:
dfn <- df3[,c("home_ownership","loan_amnt","grade")]
  1. Clean the dataset (removing the NONE and NA cases), using the following code:
dfn <- na.omit(dfn)
dfn <- subset (dfn, !dfn$home_ownership %in% c("NONE"))
  1. Create a boxplot showing the loan amount versus home ownership.
  2. Color differentiate by credit grade.

Outcome:

Refer to the following URL for the output: https://goo.gl/RheL2G.

The answers to question 5 are as follows:

  1. Credit grades F and G are the highest. Credit grades A and B are the lowest.
  2. They are higher for a person who has a mortgage.
  3. The median value for A is 2,000, and the median value for G is 20,000, so the difference is 180,000.

Activity: Using Themes and Color Differentiation in a Plot

Steps for Completion:

  1. Make a scatterplot of female versus male BMIs.
  2. Build your plot in layers, to avoid creating three separate plots.
    1. Create the default plot. Store this plot as p1.
    2. Points should be differentiated by color. Differentiate the two BMIs by country using color. The size of the points should be 2.
    3. Change the color scheme by using scale_color_brewer. The palette used is Dark2. Store this plot as p2.
    4. Add a plot title: BMI female vs BMI Male.
    5. Change more of the theme's aspects to produce plot p3. The theme aspects to be changed, and their values, are as follows:
      • Panel Background: azure; Color: black
      • No grid lines
      • Axis Title Size: 15; Axis Title Color: cadetblue4
      • Change x and y titles: BMI female and BMI Male
      • Legend: Position bottom, Lef justifid, No Legend Title, legend key (fil – gray97, color of the line=3)
      • Plot Title Color: cadetblue4; Size: 18; Face: bold.italic

Outcome:

The output code is as follows:

pd1 <- ggplot(df,aes(x=BMI_male,y=BMI_female))
pd2 <- pd1+geom_point()
pd3 <- pd1+geom_point(aes(color=Country),size=2)+
    scale_colour_brewer(palette="Dark2")
pd4 <- pd3+theme(axis.title=element_text(size=15,color="cadetblue4",
                 face="bold"),
                 plot.title=element_text(color="cadetblue4", size=18,
                 face="bold.italic"),
                 panel.background = element_rect(fill="azure",color="black"),
                 panel.grid=element_blank(),
                 legend.position="bottom",
                 legend.justification="left",
                 legend.title = element_blank(),
                 legend.key = element_rect(color=3,fill="gray97")
)+
    xlab("BMI Male")+
    ylab("BMI female")+
    ggtitle("BMI female vs BMI Male")