Book Image

Web Application Development with R Using Shiny Second Edition - Second Edition

By : Chris Beeley
Book Image

Web Application Development with R Using Shiny Second Edition - Second Edition

By: Chris Beeley

Overview of this book

R is a highly flexible and powerful tool for analyzing and visualizing data. Most of the applications built using various libraries with R are desktop-based. But what if you want to go on the web? Here comes Shiny to your rescue! Shiny allows you to create interactive web applications using the excellent analytical and graphical capabilities of R. This book will guide you through basic data management and analysis with R through your first Shiny application, and then show you how to integrate Shiny applications with your own web pages. Finally, you will learn how to finely control the inputs and outputs of your application, along with using other packages to build state-of-the-art applications, including dashboards.
Table of Contents (14 chapters)
Web Application Development with R Using Shiny Second Edition
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Advanced tasks with dplyr and ggvis


Just to get a taste of how easy it can be to do some really cool things with R and to start to build the foundation of the Shiny application that we are going to build through the course of this book, let's build a few graphics using some Google Analytics data and present them in an interactive document. We are going to use two contributed packages, dplyr and ggvis. The dplyr package provides very powerful functions for selecting, filtering, combining, and summarizing datasets. As you will see throughout this book, dplyr allows you to very rapidly process data to your exact specifications. The ggvis package provides very simple functions to make your visualizations interactive.

We're going to run through some of the code very quickly indeed, so you can get a feeling for some of the tasks and structures involved, but we'll return to this application later in the book where everything will be explained in detail. Just relax and enjoy the ride for now. If you want to browse or run all the code, it is available at chrisbeeley.net/website/index.html.

The Google Analytics code is not included because it requires a login for the Google Analytics API; instead, you can download the actual data from the previously mentioned link. Getting your own account for Google Analytics and downloading data from the API is covered in Chapter 5, Advanced Applications I – Dashboards. I am indebted to examples at goo.gl/rPFpF9 and at goo.gl/eL4Lrl for helpful examples of showing data on maps within R.

Preparing the data

In order to prepare the data for plotting, we will make use of dplyr. As with all packages that are included on the CRAN repository of packages (cran.r-project.org/web/packages/), it can be installed using the package management functions in RStudio or other GUIs, or by typing install.packages("dplyr") at the console. It's worth noting that there are even more packages available elsewhere (for example, on GitHub), which can be compiled from the source.

The first job is to prepare the data that will demonstrate some of the power of the dplyr package using the following code:

groupByDate =
filter(gadf, networkDomain %in% topThree$networkDomain) %>%
group_by(YearMonth, networkDomain) %>%
summarise(meanSession = mean(sessionDuration, na.rm = TRUE),
  users = sum(users),
  newUsers = sum(newUsers),
  sessions = sum(sessions))

This single block of code, all executed in one line, produces a dataframe suitable for plotting and uses chaining to enhance the simplicity of the code. Three separate data operations, filter(), group_by(), and summarise(), are all used, with the results from each being sent to the next instruction using the %>% operator. The three instructions carry out the following tasks:

  • filter(): This is similar to subset(). This operation keeps only rows that meet certain requirements, in this case, data for which networkDomain (the originating ISP of the page view) is in the top three most common ISPs. This has already been calculated and stored within topThree$networkDomain (this step is omitted here for brevity).

  • group_by(): This allows operations to be carried out on subsets of data points, in this case, data points subsetted by the year and month and by the originating ISP.

  • summarise(): This carries out summary functions such as sum or mean on several data points.

So, to summarize, the preceding code filters the data to select only the ISPs with the most users overall, groups it by the year or month and the ISP, and finds the sum or mean of several of the metrics within it (sessionDuration, users, and so on).

A simple interactive line plot

We already saw how easy it is to draw line plots in ggplot2. Let's add some Shiny magic to a line plot now. This can be achieved very easily indeed in RStudio by just navigating to File | New | R Markdown | New Shiny document and installing the dependencies when prompted. This will create a new R Markdown document with interactive Shiny elements. R Markdown is an extension of Markdown (daringfireball.net/projects/markdown/), which is itself a markup language, such as HTML or LaTeX, which is designed to be easy to use and read. R Markdown allows R code chunks to be run within a Markdown document, which renders the contents dynamic. There is more information about Markdown and R Markdown in Chapter 2, Building Your First Application. This section gives a very rapid introduction to the type of results possible using Shiny-enabled R Markdown documents.

For more details on how to run interactive documents outside RStudio, refer to goo.gl/NGubdo. Once the document is set up, the code is as follows:

# add interactive UI element
inputPanel(
  checkboxInput("smooth", label = "Add smoother?", value = FALSE)
)

# draw the plot
renderPlot({
  thePlot = ggplot(groupByDate, aes(x = Date, y = meanSession,
  group = networkDomain, colour = networkDomain)) +
  geom_line() + ylim(0, max(groupByDate$meanSession))
  if(input$smooth){
    thePlot = thePlot + geom_smooth()
  }
  print(thePlot)
})

That's it! You'll have an interactive graphic once you run the document (click on Run document in RStudio or use the run() command from the rmarkdown package), as shown in the following screenshot:

As you can see, Shiny allows us to turn on or off a smoothing line courtesy of geom_smooth() from the ggplot2 package.

Producing an interactive map (click to examine the value associated with each country) using the ggvis package is as simple as the following:

getUsers = function(x){
  if(is.null(x)) return(NULL)
    theCountry = head(filter(map.df, id == x$id), 1)$CNTRY_NAME
  return(filter(groupByCountry, country == theCountry)$users)
}

map.df %>%
group_by(group, id) %>%
ggvis(~long, ~lat) %>%
layer_paths(fill = ~ users) %>%
scale_numeric("fill", trans = "log", label = "log(users)") %>%
add_tooltip(getUsers, "click")  %>%
hide_axis("x") %>% hide_axis("y")

The final result looks like the following screenshot:

As you can see, the number of users is shown for the USA. This has been achieved simply by clicking on this country. Don't worry if you can't follow all of this code; this section is just designed to show you how quick and easy it is to produce effective and interactive visualizations.