Book Image

Instant Heat Maps in R How-to

By : Sebastian Raschka
Book Image

Instant Heat Maps in R How-to

By: Sebastian Raschka

Overview of this book

R has grown rapidly over the years to become one of the most versatile and valuable tools for data analysis and graphing. One of its many useful features is the heat map representation of numerical data, which is an invaluable tool to discover patterns in data quickly and efficiently. Instant Heat Maps in R How-to provides you with practical recipes to create heat maps of all difficulty levels by yourself right from the start. At the end of each recipe, you will find an in-depth analysis that will equip you with everything you need to know to frame the code to your own needs. Instant Heat Maps in R will present you with all the different heat map plotting functions that exist in R. You will start by creating simple heat maps before moving on to learn how to add more features to them. While you advance step-by-step through the well-connected recipes, you will find out which tool suits the given situation best. You will learn how to read data from popular file formats and how to format the data to create heat maps as well as the ways to export them for presentation.
Table of Contents (7 chapters)

Drawing choropleth maps and contour plots (Intermediate)


We have constructed many heat maps so far, including popular applications like gene expression and stock analyses. There are many more fields and disciplines where heat maps are an invaluable tool for intuitive and data analyses and representations. In this recipe, we will learn how to construct the closely related choropleth maps and contour plots, which are very useful to visualize geographic data.

Although choropleth maps get a huge boost of popularity during the election years for sure, there are many more widely used applications, such as population census, disease factors by regions, or income comparisons. Basically, choropleth maps are the representation of choice whenever you want to show and compare statistics between geographic regions on a cartographic map. Those geographic regions are usually separated by county, state, or even country borders.

The following choropleth map shows the annual average temperatures of the US from 1971-2001:

While contour plots are represented less in the media, they are extensively used by engineers and scientists. Common examples are the 2D projections of 3D surfaces, such as geographic terrains. However, contour plots have many more applications and can be used to plot gradients, depths, thicknesses, and other distributions as 2D projections.

Imagine a mountain from a bird's-eye perspective, where the contour lines delineate the different levels of altitude. The regions that are enclosed by the contour lines can be filled with colors or color gradients to better distinguish the different heights.

In the second part of this recipe, we will apply this concept by creating a contour plot of a volcano.

Getting ready

Download the 5644OS_04_01.r script and the data files usa_temp.csv and south_am_pop.csv from your account at http://www.packtpub.com.

I recommend that you download and save the script and data file to the same folder on your hard drive. If you execute the script from a different location to the data file, you have to change the current R working directory accordingly.

Please read the Getting ready section of the Reading data from different file formats recipe for more information on how to change the working directory of your current R session.

For more information on how to view the current working directory of your current R session and an explanation on how to run scripts in R, please read the Getting ready section of the Creating your first heat map in R recipe.

The script will check automatically if any additional packages need to be installed in R. You can find more information about the installation of packages in the Getting ready section of the Creating your first heat map in R recipe.

How to do it...

Execute the following code in R invoking the script 5644OS_04_01.r and take a look at the PDF files chloropleths_maps.pdf and contour_plot.pdf that will be created in the current working directory:

### loading packages

if (!require("ggplot2")) {
install.packages("ggplot2", dependencies = TRUE)
library(ggplot2)
}
if (!require("maps")) {
install.packages("maps", dependencies = TRUE)
library(maps)
}
if (!require("mapdata")) {
install.packages("mapdata", dependencies = TRUE)
library(mapdata)
}

pdf("chloropleth_maps.pdf", height = 7,width = 10)

### 1) average temperature USA

# 1.1) reading in and processing data
usa_map <- map_data(map = "state")

usa_temp <- read.csv("usa_temp.csv", comment.char = "#")

usa_data <- merge(usa_temp, usa_map, 
by.x ="state", by.y =  "region") # case sensitive

usa_sorted <- usa_data[order(usa_data["order"]),]

# 1.2) plotting USA chloropleth maps
usa_map1 <- ggplot(data = usa_sorted) + 
  geom_polygon(aes(x = long, y = lat, 
group = group, fill = fahrenheit)) + 
  ggtitle("USA Map 1")
print(usa_map1)

usa_map2 <- usa_map1 + coord_map("polyconic") + 
  ggtitle("USA Map 2 - polyconic")
print(usa_map2)

usa_map3 <- usa_map2 + 
  geom_path(aes(x = long, y = lat, group = group), 
color = "black") + 
  ggtitle("USA Map 3 - black contours")
print(usa_map3)

usa_map4 <- usa_map3 + 
  scale_fill_gradient(low = "yellow", high = "red") + 
  ggtitle("USA Map 4 - gradient 1")
print(usa_map4)

usa_map5 <- usa_map3 + 
scale_fill_gradient2(low = "steelblue", mid = "yellow",
  high = "red",  midpoint = colMeans(usa_sorted["fahrenheit"])) + 
  ggtitle("USA Map 5 - gradient 2")
print(usa_map5)


### 2) South American population count

# 2.1) reading in and processing data
south_am_map <- map_data("worldHires", 
region = c("Argentina", "Bolivia", "Brazil",
  "Chile", "Colombia", "Ecuador", "Falkland Islands",
"French Guiana", "Guyana", "Paraguay", "Peru",
"Suriname", "Uruguay", "Venezuela"))

south_am_pop <- read.csv("south_america_pop.csv", 
comment.char = "#")

south_am_data <- merge(south_am_pop, south_am_map,  by.x = "country", by.y = "region")

south_am_sorted <- south_am_data[order(
south_am_data["order"]),]

# 2.2) creating chloropleth maps
south_am_map1 <- ggplot(data = south_am_sorted) + 
  geom_polygon(aes(x = long, y = lat, 
group = group, fill = population)) +
  geom_path(aes(x = long, y = lat, group = group), 
color = "black") + 
  coord_map("polyconic") + 
scale_fill_gradient(low = "lightyellow",
  high = "red", guide = "legend")
print(south_am_map1)

south_am_map2 = south_am_map1 + 
  theme(panel.background = element_blank(),
  axis.text = element_blank(), 
  axis.title = element_blank(), 
  axis.ticks = element_blank())
print(south_am_map2)

dev.off()


### 3) Volcano contour plot

pdf("contour_plot.pdf", height = 7,width = 10)

data(volcano)
image.plot(volcano)
contour(volcano, add = TRUE)

dev.off()

How it works...

In this recipe we are using three new packages that we have not encountered before. One of them is Hadley Wickham's ggplot2 package, which is a relatively new and powerful plotting system. It has become a very popular alternative to R's basic plotting functions, because it provides both great versatility and a uniform syntax. The complete documentation can be found at http://docs.ggplot2.org/current/.

The other two packages, maps and mapdata, contain the data for our maps that we are going to use as templates for our choropleth maps. You can find a table with all the maps that are available in both maps and mapdata at the end of this section.

The difference between those two map packages is that they contain different maps with different levels of detail.

The following image compares the levels of detail between the map of Japan extracted from the world map of maps, the high resolution world map of mapdata, and the single high resolution mapdata map of Japan.

  1. Annual average temperatures of the United States: The first data file, usa_temp.csv, contains the annual average temperatures of the USA from 1971 to 2001. The data was made available by the National Climatic Data Center of the United States, National Oceanic and Atmospheric Administration, and has been compiled by current results.

  2. Reading in map and temperature data: First, we use the read.csv() function, known from the Reading data from different file formats recipe, to read our data and assign it to the variable usa_temp. Next, we use the map_data() function from the ggplot2 package to read in the United States map from maps as a data frame.

    Note that in contrast to the heat map functions that we have been using in the previous recipes, the gplot() function takes a data frame as input instead of data in a numerical matrix format. And as we remember from the Reading data from different file formats recipe, data that we read via read.csv() is converted into a data frame automatically.

    Note

    Matrices in R can only contain data of the same type, for example, numeric. However, data frames can contain different types of data as variables (columns), such as numeric, factor, logical, or character.

  3. Merging and sorting the data frames: Now we will merge the map and temperature data:

    usa_data <- merge(usa_temp, usa_map, 
    by.x = "state", by.y = "region") 
    usa_sorted <- usa_data[order(usa_data["order"]),]

    The following flowchart illustrates this process and shows how the data looks like before and after the merging and conversion:

    As we can see in the preceding chart, the map data in the usa_map data frame contains multiple longitudinal (long) and latitudinal (lat) coordinates for each state in the column labeled with region. The numbers in the order column specify the order in which the single dots are connected when we draw the map as a polygon.

    After we merged both map data and temperature data, we can see that the values in the order column have been shuffled. This would result in quite a mess when we wanted to construct the choropleth map from this data frame. Therefore, we use the order() function to restore the original order before we merged the data sets.

    Note

    The merge() function is case sensitive. We have to make sure that the character strings in the two columns that we want to merge match.

  4. Constructing our first choropleth map: Now that we have our data ready in a nice format, we can proceed to the main plotting functions to create our first choropleth map:

    usa_map1 <- ggplot(data = usa_sorted) + 
      geom_polygon(aes(x = long, y = lat, group = group, 
    fill = fahrenheit)) + 
      ggtitle("USA Map 1")

    Basically, how the ggplot2 package works is that we have to create our plots incrementally. First, we use the ggplot() function to assign our data to a new plot object, and then we can add further layers to it. The basic shape of ggplot2 plots is determined by so-called geoms, or geometric objects. Here, we are using geom_polygon(), which will connect our coordinate points to a map. The aes() function nested inside is there for aesthetics and defines the visual properties of the geoms. We provide the variable fahrenheit as an argument for the fill parameter so that we can shade the areas of our choropleth maps according to the Fahrenheit temperatures in our data set.

    Note

    Similar to the plotting functions of the lattice package, which we have used in the previous recipes, ggplot() creates plot graphics as objects. Therefore, we have to use the print() function on the objects to save them as image files. Note that in an interactive R session, you can omit the print() function and just need to type the name of the graphics object, for example, usa_map1, to create the map on the screen.

  5. Customizing choropleth maps: One of the advantages of ggplot2 is that we do not have to retype the whole heat map function with its arguments each time we want to modify the map, since we store our plots as objects. So if we want to convert our map into a polyconic projection, we can simple reuse our old graphic object and make a small modification:

     usa_map2 <- usa_map1 + coord_map("polyconic") + 
      ggtitle("USA Map 2 - polyconic")

    Now, let us add another geometric object so that the state borders will be drawn as black lines:

    usa_map3 <- usa_map2 + 
      geom_path(aes(x = long, y = lat, group = group), 
    color = "black") + 
      ggtitle("USA Map 3 - black contours")

    Since we are showing temperatures on our United States map, it would be more appropriate to replace the blue default color gradient by a color gradient from yellow to red. We can modify the color gradient by adding the scale_fill_gradient() function with two color arguments for our previous graphic object:

    usa_map4 <- usa_map3 + 
      scale_fill_gradient(low = "yellow", high = "red") + 
      ggtitle("USA Map 4 - gradient 1")

    Note

    Instead of discrete color names, we can also use the hex color codes that we have seen in the Customizing heat maps recipe.

    If we want to add a third color to our gradient, we can use the scale_fill_gradient2() function instead. For this function, we need an additional argument that specifies the position (mid-point) of the second color. We can simply use the temperature mean such as a midpoint measure:

    usa_map5 <- usa_map3 + 
    scale_fill_gradient2(low = "steelblue", mid = "yellow",
    high = "red",  midpoint =   
    colMeans(usa_sorted["fahrenheit"])) + 
      ggtitle("USA Map 5 - gradient 2")
  6. Extracting regions from the world map: Now that we have seen how we can use ggplot2 to create simple choropleth maps, let us explore some advanced features.

    As you can see in the table at the end of this section, the number of maps in the maps and mapdata packages is quite limited. However, we are able to extract individual countries from the world map.

    Our second data set, south_am_pop.csv, contains recent population counts of all the countries of South America. Unfortunately, neither maps nor mapdata contains a map of this continent, so we extract all the South American countries from the high resolution world map of mapdata.

    south_am_map <- map_data("worldHires", region = c("Argentina",
    "Bolivia", "Brazil","Chile", "Colombia", "Ecuador", 
    "Falkland Islands", "French Guiana", "Guyana", 
    "Paraguay", "Peru", "Suriname", "Uruguay", "Venezuela"))

    After we read the data from south_am_pop.csv and merge and sort the data frames, we can create the choropleth map:

    south_am_map1 <- ggplot(data = south_am_sorted) + 
      geom_polygon(aes(x = long, y = lat, 
    group = group, fill = population)) +
      geom_path(aes(x = long, y = lat, group = group), 
    color = "black") + 
      coord_map("polyconic") + 
    scale_fill_gradient(low = "lightyellow",
      high = "red", guide = "legend")

    As we have seen it for the USA map before, we use the ggplot() function to create a new plot object and geom_polygon to construct the map graphic. With the color parameter in geom_path, we add a black border around the individual countries, and with the coord_map function, we convert our map into a polyconic projection. We also change the color gradient from light-yellow to red and make the legend categorical by providing the argument legend for the guide parameter.

    By default, ggplot2 creates plots on a gray background with a white grid. To get a nicer layout, let us create a map on a clean, white background and remove axes tick marks, labels, and titles altogether:

    south_am_map2 = south_am_map1 + 
      theme(panel.background = element_blank(),
      axis.text = element_blank(), 
      axis.title = element_blank(), 
      axis.ticks = element_blank())

    In the following image, you can see the resulting choropleth map of South America's population count on a white background:

  7. Volcano contour plot: As mentioned in the introduction of this recipe, we want to take a look at another type of map representation, contour plots:

    data(volcano)
    image.plot(volcano)

    Using the image.plot() function from the fields library, we construct a color grid with the topographical data of a volcano from R's data package. We could also use the image() base function in R, but the image.plot() function has some nice additional features, such as placing a legend for the color grid to the right of the plot.

    After we created the underlying color grid, we use the contour() function to place the contour lines on top of the color grid.

    contour(volcano, add = TRUE)

    The following image shows the final contour plot that we have created from the volcano data set:

The following table summarizes the different maps that are included in the maps and mapdata packages:

Package

Database

Description

maps

usa

Map of the United States

county

Map of the US Counties

state

Map of the states in the US

Italy

Map of Italy

france

Map of France

nz

Map of New Zealand

world

World map

world2

Pacific centric world map

mapdata

china

High resolution map of China

japan

High resolution map of Japan

nzHires

High resolution map of New Zealand

worldHires

High resolution world map

world2Hires

High resolution Pacific centric world map