ggplot2
, ggvis
, and plotly
have proven to be very useful graphical packages in the R universe. Each of them gained a respectful sum of popularity among R users, being recalled for the several graphical tasks each of them can handle in very elegant manners.
The purpose of this section is to give a brief introduction on the general framework of ggplot2
via some basic examples, and relate how to tackle similar quests using ggvis
and plotly
. Along the way, some pros and cons from each package will be highlighted.
Note
Whenever you need to choose between some packages (and base R), it's important to balance the tasks each one were designed to handle, the amount of work it will require for you to achieve your goal (learning time included), and the time you actually have. It's also good to consider scale gains in future uses. For example, mastering ggplot2
may not seem a smart choice for a single time task but might pay-off if you're expecting lots of graphical challenges in the future.
Keep in mind that all the three packages are eligible for a large convoy of tasks. There are some jobs that a specific package is more suitable for and even some tasks that can be considered almost impracticable for others. This point will become clearer as the book goes on.
The only requirement this section holds is to have the ggplot2
, ggvis
, and plotly
packages properly installed. Go back to Installing and loading graphics packages recipe if that is not the case. Once the installation is checked, it's time to know ggplot2
framework.
Firstthings first, in order to plot using ggplot2
, data must come from a data frame object. Data can come from more than one data frame but it's mandatory to have it arranged into objects from the data frame class.
- We took the
cars
data set to fit this first graphic. It's good to actually get to know the data before plotting, so let's do it using the?
,class()
, andhead()
functions:
> ?cars > class(cars) > head(cars)
- Plots coming from
ggplot2
can be stored by objects. They would fit two classes at same time,gg
andggplot
:
> library(ggplot2) > plot1 <- ggplot(cars, aes(x = speed,y = dist))
Note
Objects created by the ggplot()
function get to be from classes gg
and ggplot
at the same time. That said, you can to refer to a plot crafted by ggplot2
as a ggplot
.
- The three packages work more or less in a layered way. To add what we call layers to a
ggplot
, we can use the+
operator:
> plot1 + geom_point()
Result is shown by the following figure:
Figure 1.2 - Simple ggplot2 scatterplot.
- Once you learn this framework, getting to know how
ggvis
works becomes much easier, and vice-versa. A similar graphic can be crafted with the following code:
> library(ggvis) > ggvis(data = cars, x = ~speed, y = ~dist) %>% layer_points()
plotly
would feel a little bit different, but it's not difficult at all to grasp how it works:
> library(plotly) > plot_ly(data = cars, x = ~speed, y = ~dist, type = 'scatter', mode = 'markers')
Let's give these nuts and bolts some explanations.
In order to have a brief data introduction, step 1 starts by calling ?cars
. This is a very useful way to get to meet variables and background related to almost every data set coming from a package. Onceggplot2
requires data coming from data frames, class()
function is checking if is that the case, answer is affirmative. At the end of this step head()
function is checking upon the first six observations.
Moving on to step 2, after loading ggplot2
, it demonstrates how to store the basic coordinate mapping and aesthetics into an object called plot1
(try it on the class()
function). In order to set the basics, it uses a function (ggplot()
) that initializes every single ggplot
.
Note
Storing a plot coming from ggplot2
, ggvis
, or plotly
package into an object is optional, though very useful way to proceed.
To properly set ggplot()
, start by declaring data set using data
argument. After that, some basic aesthetics and coordinates are assigned. Different figures can ask and work along with different aesthetics, for the majority of cases those are named inside the aes()
function.
Note
As the books goes on you're going to get used to the ways how aesthetics can be declared-in or outside the aes()
function. For now, let's acknowledged that inside aes()
it's possible to call data frame variables by name and they may be displayed in legends.
Checking ?aes()
shows "..."
as argument, popularly known as three-dots but technically named ellipsis. It allows the user to pass an arbitrary number and variety of arguments. So as ggplot2
does lazy-evaluation (only evaluates arguments as they are requested, you could make up arguments and pass them into the aes()
function with zero or only little trouble to the function. Perceive the following:
> plot1 <- ggplot(cars, aes(x = speed,y = dist, gorillaTroubleShooter = T, sight = 'Legolas'))
It would work as good as the earlier version. Just don't forget to name the arguments and you got yourself a good way to create some Easter eggs at your code (also a good way to confuse unaware developers). Both aes()
and ggplot()
play core roles in building graphics within this package.
Until step 2, only coordinate mapping was set at object named plot1
, calling for it alone displays an empty graphic. Step 3 uses %+%
to add a layer, the layer called (geom_point()
) took care of fixing a geometry to the graphic. Besides the plus sign, ggplot
s are usually constructed by two families of functions (layers): geom_*
and stat_*
. While the first family comes with a fixed geometry and a default statistical transformation, the second one comes with fixed statistical transformations and a default geometry (this is grammar of graphics for real), defaults can be tweaked.
Note
plot1 + stat_identity(geom = 'point')
works just the same as step 3. Argument geom
is set for 'point'
as default for stat_identity()
, it's fine to skip it. The reason I declared it was to reinforce that if you call for a statistical transformation you can pick the geometry and it goes the other way round (if you call for a geometry you can change the statistical transformation).
Behind the scene, geom_point()
called the layer()
function, which set a couple of arguments that culminated in the creation of a scatterplot. One may want to modify the axis labels and add a regression line. It can be done by simply adding more layers to the plot using the plus sign. One can stack as many layers desired, as shown next:
> plot1 + geom_point() + > labs(x = "Speed (mpg)", y = "Distance (ft)") + > geom_smooth(method = "lm", se = F) + > scale_y_continuous(breaks = seq(0, 125, 25))
Result is exhibited by figure 1.3:
Figure 1.3 - Adding up several layers to a ggplot.
Combining ggplot2
's sum operator (that is actually a function) and functions allows the user to make plots in a layered, iterative way. It splits complex graphics construction into several simple steps. It's also very intuitive and does not get any harder as you practice.
Yet, there are limitations. The difficulty to make interactive graphics by itselft may be one. These tasks, in the majority of the cases, are very well handled by both ggvis
and plotly
as stand alone packages. This leads us to steps 4 and 5.
Note
Calling plotly::ggplotly()
after bringing a ggplot
up will coerce it into an interactive plot. It may fail sometimes. Do not forget to have plotly
installed.
Step 4 loads ggvis
package using library()
and then gives birth to an interactive plot. It holds many similarities with ggplot2
. Functionggvis()
handles basic coordinating mapping while pipe operator (%>%
) is used to add up a layer called by the layer_points()
function. Remember, pipe operator and not plus sign.
Note
ggvis
understands different arguments declared using =
(ever scaled) and :=
(never scaled). Also, ~
must come before the variable names.
Function names may change and also does the operator used to add up layers from ggplot2
to ggvis
, but essentially the underlying logic keeps still. Layers coming from ggvis
has several correspondences with ggplot2
's ones; refer to the See also section to track some. In comparison with ggplot2
, ggvis
is much younger and some utilities may be yet to come, also data don't need to come from a data frame object.
Step 5 draws an interactive plotly
graph. A single function (plot_ly()
) takes care of coordinate mapping and geometry. It can be designed a little more layered using the add_traces()
function, but there is no real need for that when the plot is too simple. Instead of having many functions demanding statistical transformations and geometries those are declared by arguments inside the main function.
These three packages, ggplot2
, ggvis
, and plotly
, are well coded and powerful graphic packages. Right before picking one of them to handle a task do ever consider some points like:
- What the package is able to do
- Time needed to master the skill set required
- Time required to handle the task
- Amount of time available
- Time to be saved later by the thing that you learned
Base R is also a feasible possibility. Whenever you face new challenges, it is a good thing to think through these points.
To have data coming solely from data frames is a strong restriction, but it does obligate the user to be explicit about the data and also draw a very clear line on what is ggplot2
's concern (data visualization) and what is not (model visualization). In order to avoid headaches that come from downloading spreadsheets, setting up working directories, and loading data from files, we're taking an alternative way: getting data from packages instead.
By doing this, we ensure that the readers only need to reach the R's console to reproduce recipes; we want nothing to do with web browsers (we're too cool for school, school meaning web browsers). We shall follow this approach to the end of the book. This recipe look over datasets
base packages to do so. ggplot2
has some data frames of its own.
It's also important to outline that the gg in the ggplot2
and ggvis
refer to the Grammar of Graphics. That's a very important and inspiring theory that in had influenced ggplot2
, ggvis
, and plotly
. The layered/iterative way that these packages handle plots might come from the Grammar of Graphics and makes graphics building much easier and reasonable. Learning this theory may give you heads into the process of learning these packages while learning these packages may give you heads when it comes to learn the Grammar of Graphics.
ggplot2
Cheatsheet made by Rstudio can be found at https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdfLearn layers from
ggplot2
's author and a real R-universe star at http://rpubs.com/hadley/ggplot2-layersDid you know that gg's
+
is actually a shortcut for a function? A clue on that and some exercises are hidden at http://rpubs.com/hadley/97970Learn more about
ggvis
layers and how they can be translated intoggplot2
ones at http://ggvis.rstudio.com/layers.htmlLearn more about
ggvis
scaled and unscaled arguments at http://ggvis.rstudio.com/properties-scales.html