R Data Visualization Recipes

R Data Visualization Recipes

By : Vitor Bianchi Lanzetta

Buy this Book

R Data Visualization Recipes

By: Vitor Bianchi Lanzetta

Buy this Book

Overview of this book

R is an open source language for data analysis and graphics that allows users to load various packages for effective and better data interpretation. Its popularity has soared in recent years because of its powerful capabilities when it comes to turning different kinds of data into intuitive visualization solutions. This book is an update to our earlier R data visualization cookbook with 100 percent fresh content and covering all the cutting edge R data visualization tools. This book is packed with practical recipes, designed to provide you with all the guidance needed to get to grips with data visualization using R. It starts off with the basics of ggplot2, ggvis, and plotly visualization packages, along with an introduction to creating maps and customizing them, before progressively taking you through various ggplot2 extensions, such as ggforce, ggrepel, and gganimate. Using real-world datasets, you will analyze and visualize your data as histograms, bar graphs, and scatterplots, and customize your plots with various themes and coloring options. The book also covers advanced visualization aspects such as creating interactive dashboards using Shiny By the end of the book, you will be equipped with key techniques to create impressive data visualizations with professional efficiency and precision.

Title Page

Credits

About the Author

About the Reviewer

www.PacktPub.com

Customer Feedback

Preface

Free Chapter

Installation and Introduction

Introduction

Installing and loading graphics packages

Using ggplot2, plotly, and ggvis

Making plots using primitives

Plotting Two Continuous Variables

Introduction

Plotting a basic scatterplot

Hacking ggvis add_axis() function to operate as a title function

Plotting a scatterplot with shapes and colors

Plotting a shape reference palette for ggplot2

Dealing with over-plotting, reducing points

Dealing with over-plotting, jittering points

Dealing with over-plotting, alpha blending

Rug the margins using geom_rug()

Adding marginal histograms using ggExtra

Drawing marginal histogram using gridExtra

Crafting marginal plots with plotly

Adding regression lines

Adding quantile regression lines

Drawing publish-quality scatterplots

Plotting a Discrete Predictor and a Continuous Response

Introduction

Installing car package and getting familiar to data

Drawing simple box plots

Adding notches and jitters to box plots

Drawing bivariate dot plots using ggplot2

Using more suitable colors for geom_dotplot

Combining box with dot plots

Using point geometry to work as dots using ggvis, plotly and ggplot2

Crafting simple violin plots

Using stat_summary to customize violin plots

Manually sorting and coloring violins

Using joy package to replace violins

Creating publication quality violin plots

Plotting One Variable

Introduction

Creating a simple histogram using geom_histogram()

Creating an histogram with custom colors and bins width

Crafting and coloring area plots using geom_area() and more

Drawing density plots using geom_density()

Drawing univariate colored dot plots with geom_dotplot()

Crafting univariate bar charts

Using rtweet and ggplot2 to plot twitter words frequencies

Drawing publish quality density plot

Making Other Bivariate Plots

Introduction

Creating simple stacked bar graphs

Crafting proportional stacked bar

Plotting side-by-side bar graph

Plotting a bar graphic with aggregated data using geom_col()

Adding variability estimates to plots with geom_errrorbar()

Making line plots

Making static and interactive hexagon plots

Adjusting your hexagon plot

Developing a publish quality proportional stacked bar graph

Creating Maps

Introduction

Making simple maps - 1854 London Streets

Creating an interactive cholera map using plotly

Crafting choropleth maps using ggplot2

Zooming in on the map

Creating different maps based on different map projection types

Handling shapefiles to map Afghanistan health facilities

Crafting an interactive globe using plotly

Creating high quality maps

Faceting

Introduction

Creating a faceted bar graph

Crafting faceted histograms

Creating a facet box plot

Crafting a faceted line plot

Making faceted scatterplots

Creating faceted maps

Drawing facets using plotly

Plotting a high quality faceted bar graph

Designing Three-Dimensional Plots

Introduction

Drawing a simple contour plot using ggplot2

Picking a custom number of contour lines

Using the directlabels package to label the contours

Crafting a simple tile plot with ggplot2

Creating simple raster plots with ggplot2

Designing a three-dimensional plot with plotly

Crafting a publication quality contour plot

Using Theming Packages

Introduction

Drawing a bubble plot

Popular themes with ggthemes

Applying sci themes with ggsci

Importing new fonts with the extrafont package

Using ggtech to mimic tech companies themes

Wrapping a custom theme function

Applying awesome themes and checking misspells with hrbrthemes

Designing More Specialized Plots

Introduction

Drawing wonderful facets zoom with the ggforce package

Drawing sina plots with ggforce

Using ggrepel to plot non-overlaying texts

Visualizing relational data structures with ggraph

Draw alternative lollipop and density plots with ggalt

Making Interactive Plots

Introduction

Using ggiraph to create interactive plots

Using gganimate to craft animated ggplots

Crafting animated plots with tweenr

Building Shiny Dashboards

Introduction

Installing and loading a shiny package

Creating basic shiny interactive plots

Developing intermediate shiny interactive plots

Building a shiny dashboard

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Using ggplot2, plotly, and ggvis

ggplot2, ggvis, and plotly have proven to be very useful graphical packages in the R universe. Each of them gained a respectful sum of popularity among R users, being recalled for the several graphical tasks each of them can handle in very elegant manners.

The purpose of this section is to give a brief introduction on the general framework of ggplot2 via some basic examples, and relate how to tackle similar quests using ggvis and plotly. Along the way, some pros and cons from each package will be highlighted.

Note

Whenever you need to choose between some packages (and base R), it's important to balance the tasks each one were designed to handle, the amount of work it will require for you to achieve your goal (learning time included), and the time you actually have. It's also good to consider scale gains in future uses. For example, mastering ggplot2 may not seem a smart choice for a single time task but might pay-off if you're expecting lots of graphical challenges in the future.

Keep in mind that all the three packages are eligible for a large convoy of tasks. There are some jobs that a specific package is more suitable for and even some tasks that can be considered almost impracticable for others. This point will become clearer as the book goes on.

Getting ready

The only requirement this section holds is to have the ggplot2, ggvis, and plotly packages properly installed. Go back to Installing and loading graphics packages recipe if that is not the case. Once the installation is checked, it's time to know ggplot2 framework.

How to do it...

Firstthings first, in order to plot using ggplot2, data must come from a data frame object. Data can come from more than one data frame but it's mandatory to have it arranged into objects from the data frame class.

We took the cars data set to fit this first graphic. It's good to actually get to know the data before plotting, so let's do it using the ?, class(), and head() functions:

> ?cars
> class(cars)
> head(cars)

Plots coming from ggplot2 can be stored by objects. They would fit two classes at same time, gg and ggplot:

> library(ggplot2)
> plot1 <- ggplot(cars, aes(x = speed,y = dist))

Note

Objects created by the ggplot() function get to be from classes gg and ggplot at the same time. That said, you can to refer to a plot crafted by ggplot2 as a ggplot.

The three packages work more or less in a layered way. To add what we call layers to a ggplot, we can use the + operator:

 > plot1 + geom_point()

Note

The + operator is in reality a function.

Result is shown by the following figure:

Figure 1.2 - Simple ggplot2 scatterplot.

Once you learn this framework, getting to know how ggvis works becomes much easier, and vice-versa. A similar graphic can be crafted with the following code:

> library(ggvis)
> ggvis(data = cars, x = ~speed, y = ~dist) %>% layer_points()

plotly would feel a little bit different, but it's not difficult at all to grasp how it works:

> library(plotly)
> plot_ly(data = cars, x = ~speed, y = ~dist, type = 'scatter', mode = 'markers')

Let's give these nuts and bolts some explanations.

How it works...

In order to have a brief data introduction, step 1 starts by calling ?cars. This is a very useful way to get to meet variables and background related to almost every data set coming from a package. Onceggplot2 requires data coming from data frames, class() function is checking if is that the case, answer is affirmative. At the end of this step head() function is checking upon the first six observations.

Moving on to step 2, after loading ggplot2, it demonstrates how to store the basic coordinate mapping and aesthetics into an object called plot1 (try it on the class() function). In order to set the basics, it uses a function (ggplot()) that initializes every single ggplot.

Note

Storing a plot coming from ggplot2, ggvis, or plotly package into an object is optional, though very useful way to proceed.

To properly set ggplot(), start by declaring data set using data argument. After that, some basic aesthetics and coordinates are assigned. Different figures can ask and work along with different aesthetics, for the majority of cases those are named inside the aes() function.

Note

As the books goes on you're going to get used to the ways how aesthetics can be declared-in or outside the aes() function. For now, let's acknowledged that inside aes() it's possible to call data frame variables by name and they may be displayed in legends.

Checking ?aes() shows "..." as argument, popularly known as three-dots but technically named ellipsis. It allows the user to pass an arbitrary number and variety of arguments. So as ggplot2 does lazy-evaluation (only evaluates arguments as they are requested, you could make up arguments and pass them into the aes() function with zero or only little trouble to the function. Perceive the following:

> plot1 <- ggplot(cars, aes(x = speed,y = dist, gorillaTroubleShooter = T, sight = 'Legolas'))

It would work as good as the earlier version. Just don't forget to name the arguments and you got yourself a good way to create some Easter eggs at your code (also a good way to confuse unaware developers). Both aes() and ggplot() play core roles in building graphics within this package.

Until step 2, only coordinate mapping was set at object named plot1, calling for it alone displays an empty graphic. Step 3 uses %+% to add a layer, the layer called (geom_point()) took care of fixing a geometry to the graphic. Besides the plus sign, ggplots are usually constructed by two families of functions (layers): geom_*and stat_*. While the first family comes with a fixed geometry and a default statistical transformation, the second one comes with fixed statistical transformations and a default geometry (this is grammar of graphics for real), defaults can be tweaked.

Note

plot1 + stat_identity(geom = 'point') works just the same as step 3. Argument geom is set for 'point' as default for stat_identity(), it's fine to skip it. The reason I declared it was to reinforce that if you call for a statistical transformation you can pick the geometry and it goes the other way round (if you call for a geometry you can change the statistical transformation).

Behind the scene, geom_point() called the layer() function, which set a couple of arguments that culminated in the creation of a scatterplot. One may want to modify the axis labels and add a regression line. It can be done by simply adding more layers to the plot using the plus sign. One can stack as many layers desired, as shown next:

> plot1 + geom_point() +
> labs(x = "Speed (mpg)", y = "Distance (ft)") +
> geom_smooth(method = "lm", se = F) +
> scale_y_continuous(breaks = seq(0, 125, 25))

Result is exhibited by figure 1.3:

Figure 1.3 - Adding up several layers to a ggplot.

Combining ggplot2's sum operator (that is actually a function) and functions allows the user to make plots in a layered, iterative way. It splits complex graphics construction into several simple steps. It's also very intuitive and does not get any harder as you practice.

Yet, there are limitations. The difficulty to make interactive graphics by itselft may be one. These tasks, in the majority of the cases, are very well handled by both ggvis and plotly as stand alone packages. This leads us to steps 4 and 5.

Note

Calling plotly::ggplotly() after bringing a ggplot up will coerce it into an interactive plot. It may fail sometimes. Do not forget to have plotly installed.

Step 4 loads ggvis package using library() and then gives birth to an interactive plot. It holds many similarities with ggplot2. Functionggvis() handles basic coordinating mapping while pipe operator (%>%) is used to add up a layer called by the layer_points() function. Remember, pipe operator and not plus sign.

Note

ggvis understands different arguments declared using = (ever scaled) and := (never scaled). Also, ~ must come before the variable names.

Function names may change and also does the operator used to add up layers from ggplot2 to ggvis, but essentially the underlying logic keeps still. Layers coming from ggvis has several correspondences with ggplot2's ones; refer to the See also section to track some. In comparison with ggplot2, ggvis is much younger and some utilities may be yet to come, also data don't need to come from a data frame object.

Step 5 draws an interactive plotly graph. A single function (plot_ly()) takes care of coordinate mapping and geometry. It can be designed a little more layered using the add_traces() function, but there is no real need for that when the plot is too simple. Instead of having many functions demanding statistical transformations and geometries those are declared by arguments inside the main function.

These three packages, ggplot2, ggvis, and plotly, are well coded and powerful graphic packages. Right before picking one of them to handle a task do ever consider some points like:

What the package is able to do
Time needed to master the skill set required
Time required to handle the task
Amount of time available
Time to be saved later by the thing that you learned

Base R is also a feasible possibility. Whenever you face new challenges, it is a good thing to think through these points.

There's more

To have data coming solely from data frames is a strong restriction, but it does obligate the user to be explicit about the data and also draw a very clear line on what is ggplot2's concern (data visualization) and what is not (model visualization). In order to avoid headaches that come from downloading spreadsheets, setting up working directories, and loading data from files, we're taking an alternative way: getting data from packages instead.

Note

data.frame() may be the most convenient function to coerce vectors into data frames in R.

By doing this, we ensure that the readers only need to reach the R's console to reproduce recipes; we want nothing to do with web browsers (we're too cool for school, school meaning web browsers). We shall follow this approach to the end of the book. This recipe look over datasets base packages to do so. ggplot2 has some data frames of its own.

Note

Enter library(help = 'datasets') to general information on the other data sets.

It's also important to outline that the gg in the ggplot2 and ggvis refer to the Grammar of Graphics. That's a very important and inspiring theory that in had influenced ggplot2, ggvis, and plotly. The layered/iterative way that these packages handle plots might come from the Grammar of Graphics and makes graphics building much easier and reasonable. Learning this theory may give you heads into the process of learning these packages while learning these packages may give you heads when it comes to learn the Grammar of Graphics.

R Data Visualization Recipes

By : Vitor Bianchi Lanzetta

R Data Visualization Recipes

By: Vitor Bianchi Lanzetta

Overview of this book

Related Content you might be interested in

Current Title:

R Data Visualization Recipes

Hands-On Data Science with R

R Programming Fundamentals

Applied Data Visualization with R and ggplot2

Using ggplot2, plotly, and ggvis

Note

Getting ready

How to do it...

Note

Note

How it works...

Note

Note

Note

Note

Note

There's more

Note

Note

See also