Book Image

R Data Visualization Recipes

By : Vitor Bianchi Lanzetta
Book Image

R Data Visualization Recipes

By: Vitor Bianchi Lanzetta

Overview of this book

R is an open source language for data analysis and graphics that allows users to load various packages for effective and better data interpretation. Its popularity has soared in recent years because of its powerful capabilities when it comes to turning different kinds of data into intuitive visualization solutions. This book is an update to our earlier R data visualization cookbook with 100 percent fresh content and covering all the cutting edge R data visualization tools. This book is packed with practical recipes, designed to provide you with all the guidance needed to get to grips with data visualization using R. It starts off with the basics of ggplot2, ggvis, and plotly visualization packages, along with an introduction to creating maps and customizing them, before progressively taking you through various ggplot2 extensions, such as ggforce, ggrepel, and gganimate. Using real-world datasets, you will analyze and visualize your data as histograms, bar graphs, and scatterplots, and customize your plots with various themes and coloring options. The book also covers advanced visualization aspects such as creating interactive dashboards using Shiny By the end of the book, you will be equipped with key techniques to create impressive data visualizations with professional efficiency and precision.
Table of Contents (19 chapters)
Title Page
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

Making plots using primitives


Previously, a brief introduction on the frameworks of ggplot2, ggvis and plotly package was conducted. Next we are getting started with ggplot2 graphical primitives, using them in a series of recipes with related examples made with ggvis and plotly.

There are a total of eight graphical primitives at ggplot2, one of them already covered in this chapter (geom_point()). It's important to know the primitives well-what they do and when to use them. As fundamental building blocks, they play an essential role in the drawing process. A series of tasks can be handled relying on primitives when there is no dedicated function to handle some task; sometimes even if there is, primitives can handle it much better.

A good example are the dot plots. They have this dedicated geom_dotplot() function, but sometimes it is much easier to draw dot plots using geom_point(). Now, let's see how ggplot2can brew figures using primitives and create related ones using ggvis and plotly

How to do it...

  1. After loading the package, primitives geom_point() and geom_path() can be stacked in order to plot lines with markers:
> library(ggplot2)
> plot1 <- ggplot( cars, aes(x = speed, y = dist))
> plot1 + geom_point() + geom_path()

The resulting output is shown by following figure:

Figure 1.4 - Lines with markers plot made by ggplot2's primitives.

  1. Same mission can be nailed by the ggvis package, relying on the following code:
> library(ggvis)
> ggvis(cars, x = ~speed, y = ~dist) %>% layer_points() %>% layer_paths()

Following figure 1.5 displays a representation of the resulting graphic (only default theme will look different):

Figure 1.5 - Similar lines and markers plot done by ggvis.

  1. Without using the translation function (ggplotly()) from plotly package, it's also possible to code a similar graphic from scratch relying only on plotly:
> library(plotly)
> plot_ly(cars, x = ~speed, y = ~dist, type = 'scatter', mode = 'lines+markers')

Following figure 1.6 exhibits a snapshot of the graphic brewed by the latest code:

Figure 1.6 - Similar lines and markers plot done by plotly.

 

Let's understand how these are unfolding.

How it works...

Complete list of ggplot2's primitives is given by  geom_*blank()path()ribbon(), polygon()segment()rect()text(), and point(). Every primitive starts with geom_* but not every geom_* is a primitive. In fact, the better odds stands for quite the opposite.

More or less, geom_blank() seems to be the simplest of the primitives. Calling it right after setting ggplot() will display a blank plot with axis already adjusted. It's mostly used to check axes limits given by data itself. Maybe you can find it useful for another task; suit yourself.

Other primitives may work in a similar way. That is the case for geom_path(), geom_ribbon(), and geom_polygon() functions. The first one draws lines between coordinates, second one looks like the first but thicker, requiring additional aes() arguments (ymin and ymax). Last function draws filled polygons.

By setting only the starting and ending points, geom_segment() adds a segment line. geom_rect() adds a rectangle to the plot, requiring four corners to do so (xmin, xmax, ymin, and ymax). geom_text()add texts to the given coordinates. Some graphics displays only texts for each observations instead of points, also a good way to display additional information.

The remaining primitive is geom_point(). It's the only primitive direct called so far, it plot points at given coordinates. Two important points must be highlighted here. One, getting to know the primitives might give you an idea about which function you will require the most and which one the least, but that is not all that ggplot2 is capable of doing. Primitives are nothing but the building blocks used by other functions.

For the second point, as the previous recipe stated earlier, you can stack as many layers as you feel like. That is not less true for primitives functions, but it's good to know how they interact with one another. For example, calling geom_blank() after geom_point() may not override the points with a blank space.

After loading ggplot2 and setting base aes(), step 1 is creating a simple plot with lines and markers. While geom_point() displays the markers, geom_path() draws the lines between them. Note that the last function draws lines following the order given by data set rows, so we can call this function order-sensitive.

Note

For many situations, reordering data will improve viz. This may be the case for dot, box, violin, bar plots, and others. If you want paths to be ordered within the x variable, geom_line() does that by itself, though it is not a primitive.

To this particular plot, the lines attach no meaning; they actually mislead. Lines are better designated to indicate some sort of order within the data, like chronological order. The only reason they were used was to demonstrate how primitives could be stacked to originate different viz from the one done before.

Step 2 is drawing a plot similar to the one crafted by step 1 but using ggvis instead. libray() loads the package while the ggvis() function is used to map the basic aesthetics. Following function (layer_points()) sets up the points to work as our markers and layer_paths() draws the lines between them.

Earlier section argued that ggvis is very similar to ggplot2 in the ways of coding graphics. This section actually demonstrated that. First, the function gets the data set and the variables are inputted as arguments. Pipe operators (%>%) are used instead of plus sign to stack up the layers, and layer_* works in a very similar way as geom_* does.

By step 3, a similar plotly graphic is crafted. Same function responsible for setting basic aesthetic mapping (plot_ly()) is also dealing geometries. Arguments type and mode set the geometries, both inputted with strings. These two arguments are meant to work together.

Setting type = 'scatter' enables the lines and markers modes. Each type has a whole particular convoy of modes attached to it; consult the reference manual to catch them all. The way we wanted to is to use markers and lines at same time so we built a string containing those two elements separated by the plus sign ('lines+markers'), and assigned it to mode argument.

Note

mode = 'lines+markers' works as good as mode = 'markers+lines'. Modes can be stacked and order does not matter.

Figures 1.4 to 1.6 five resembles much a time series, but they aren't and it may give the wrong intuition.There are observations for two variables and neither one is time. Notice how for some speeds values there are up to 4 different distances to stop. Note that the cars data frame is ordered first by speed and then by distance, paths obey the row order showed by data while for point geometry order doesn't really matter.

Adding path geometry was misleading, geom_point() would be enough. Goal here was to demonstrate primitives interaction and not to give a meaningful figure. Next, let's build fictional data and draw a graphic that tells the story the right way. Picture a small classroom with only 7 students. The teacher builds a data frame with studying hours and grades for each student.

Data can be created like this:

> allnames <- c('Phill','Ross','Kate','Patrice','Peter','James','Monica')
> classr <- data.frame(names = allnames)
> classr$hours <- c(4, 16, 8, 11, 6, 14, 8)
> classr$grades <- c(4, 9.5, 6, 4, 6, 9, 7.5)

geom_text() primitive could be used to summon a meaningful graphic:

> library(ggplot2)
> plot2 <- ggplot( classr, aes(x = hours, y = grades))
> plot2 + geom_text( aes( labels = names))

The result would be like shown in the following figure 1.7:

Figure 1.7 - Plotting grades and hours as texts using ggplot2's primitive.

Related ggvis and plotly codes are shown next:

> library(ggvis)
> ggvis(classr, x = ~hours, y = ~grades, text := ~names) %>% layer_text()
> library(plotly)
> plot_ly(classr, x = ~hours, y = ~grades, type = 'scatter', mode = 'text', text = ~names)

This last brief example illustrates how to brew graphics using only primitives in a more meaningful way. It's very important to think about it. The better graphic is the one that tells the right story objectively and not the one with many layers. 

There's more...

Did you know that both ggvis and plotly can guess which geometry you are looking for? Based on the basic aesthetics defined, they make a guess and adopt certain geometry. They look at how many variables of what kind (discrete or continuous) were inputted, and for some combinations they are able to make a guess.For the nearest example they would have guessed points geometry.

Figures breed by both packages will be displayed by the Viewer tab if you're using RStudio (They are interactive! Try hoovering the mouse over a plotly figure). Figures can be exported as web pages. Other than that, they can be exported as PNG, JPEG, and BMP, therefore losing the interactive property. 

This recipe aimed to demonstrate how to construct plots using ggplot2 primitives, and build similar graphs using other packages. A question you should always ask yourself is if the geometry adopted goes along with the data used. In other words, if the graphic tells the story that you are willing to.

The recipes's goal was to introduce you to the graphical primitives of ggplot2 and draw simple graphics by using only primitives. Additional goal was to draw related graphics using the ggvis and plotly packages.

The next chapters dive deeper; each one shall tackle some families of graphics, highlighting nuts and bolts in the way to building high quality plots. As the book advances, so does the complexity involved. At some point, we are going to be plotting interactive globes, 3D surfaces and developing web applications. I find it pretty sicking cool, hope you enjoy it.

Chapter 2, Plotting Two Continuous Variables, takes care of scatterplots. It's a very popular kind of plot, and very useful too, but there is a big problem: over-plotting. Following chapter will not only teach how to craft scatterplots, but also teach how to deal with such problem and how to improve scatters by deploying marginal plots. Let it rip!