Book Image

Mastering Scientific Computing with R

Book Image

Mastering Scientific Computing with R

Overview of this book

Table of Contents (17 chapters)
Mastering Scientific Computing with R
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Basic plots and the ggplot2 package


This section will review how to make basic plots using the built-in R functions and the ggplot2 package to plot graphics.

Basic plots in R include histograms and scatterplots. To plot a histogram, we use the hist() function:

> x <- c(5, 7, 12, 15, 35, 9, 5, 17, 24, 27, 16, 32)
> hist(x) 

The output is shown in the following plot:

You can plot mathematical formulas with the plot() function as follows:

> x <- seq(2, 25, by=1)
> y <- x^2 +3
> plot(x, y)

The output is shown in the following plot:

You can graph a univariate mathematical function on an interval using the curve() function with the from and to arguments to set the left and right endpoints, respectively. The expr argument allows you to set a numeric vector or function that returns a numeric vector as an output, as follows:

# For two figures per plot.
> par(mfrow=c(1,2))
> curve(expr=cos(x), from=0, to=8*pi)
> curve(expr=x^2, from=0, to=32)

In the following figure, the plot to your left shows the curve for cox(x) and the plot to the right shows the curve for x^2. As you can see, using the from and to arguments, we can specify the x values to show in our figure.

You can also graph scatterplots using the plot() function. For example, we can use the iris dataset as part of R to plot Sepal.Length versus Sepal.Width as follows:

> plot(iris$Sepal.Length, iris$Sepal.Width, main="Iris sepal length vs width measurements", xlab="Length", ylab="Width")

The output is shown in the following plot:

R has built-in functions that allow you to plot other types of graphics such as the barplots(), dotchart(), pie(), and boxplot() functions. The following are some examples using the VADeaths dataset:

> VADeaths
      Rural Male Rural Female Urban Male Urban Female
50-54       11.7          8.7       15.4          8.4
55-59       18.1         11.7       24.3         13.6
60-64       26.9         20.3       37.0         19.3
65-69       41.0         30.9       54.6         35.1
70-74       66.0         54.3       71.1         50.0
> barplot(VADeaths, beside=TRUE, legend=TRUE, ylim=c(0, 100), ylab="Deaths per 1000 population", main="Death rate in VA") #Requires that the data to plot be a vector or a matrix.

The output is shown in the following plot:

However, when working with data frames, it is often much simpler to use the ggplot2 package to make a bar plot, since your data will not have to be converted to a vector or matrix first. However, you need to be aware that ggplot2 often requires that your data be stored in a data frame in long format and not wide format.

The following is an example of data stored in wide format. In this example, we look at the expression level of the MYC and BRCA2 genes in two different cell lines, after these cells were treated with a vehicle-control, drug1 or drug2 for 48 hours:

> geneExpdata.wide <- read.table(header=TRUE, text='
 cell_line gene control drug1 drug2
       CL1   MYC     20.4  15.9  1.5
       CL2   MYC     26.9  18.1  6.7
       CL1   BRCA2     109.5  18.1  89.8
       CL2   BRCA2    121.3  24.4  120.2
 ')

The following is the data rewritten in long format:

> geneExpdata.long <- read.table(header=TRUE, text='
   cell_line  gene variable value
1        CL1   MYC  control  20.4
2        CL2   MYC  control  26.9
3        CL1 BRCA2  control 109.5
4        CL2 BRCA2  control 121.3
5        CL1   MYC    drug1  15.9
6        CL2   MYC    drug1  18.1
7        CL1 BRCA2    drug1  18.1
8        CL2 BRCA2    drug1  24.4
9        CL1   MYC    drug2   1.5
10       CL2   MYC    drug2   6.7
11       CL1 BRCA2    drug2  89.8
12       CL2 BRCA2    drug2 120.2
')

Instead of rewriting the data frame by hand, this process can be automated using the melt() function, which is a part of the reshape2 package:

> library("reshape2")
> geneExpdata.long<- melt(geneExpdata.wide, id.vars=c("cell_line","gene"), measure.vars=c("control", "drug1", "drug2" ), variable.name="condition", value.name="gene_expr_value")

Now, we can plot the data using ggplot2 as follows:

> library("ggplot2")
> ggplot(geneExpdata.long, aes(x=gene, y= gene_expr_value)) + geom_bar(aes(fill=condition), colour="black", position=position_dodge(), stat="identity")

The output is shown in the following plot:

Another useful trick to know is how to add error bars to bar plots. Here, we have a summary data frame of standard deviation (sd), standard error (se), and confidence interval (ci) for the geneExpdata.long dataset as follows:

> geneExpdata.summary <- read.table(header=TRUE, text='
   gene condition N gene_expr_value        sd    se        ci
1 BRCA2   control 2          115.40  8.343860  5.90  74.96661
2 BRCA2     drug1 2           21.25  4.454773  3.15  40.02454
3 BRCA2     drug2 2          105.00 21.496046 15.20 193.13431
4   MYC   control 2           23.65  4.596194  3.25  41.29517
5   MYC     drug1 2           17.00  1.555635  1.10  13.97683
6   MYC     drug2 2            4.10  3.676955  2.60  33.03613
')
> #Note the plot is stored in the p object 
> p<- ggplot(geneExpdata.summary, aes(x=gene, y= gene_expr_value, fill=condition)) + geom_bar(aes(fill=condition), colour="black", position=position_dodge(), stat="identity")
> #Define the upper and lower limits for the error bars
> limits <- aes(ymax = gene_expr_value + se, ymin= gene_expr_value - se)
> #Add error bars to plot
> p + geom_errorbar(limits, position=position_dodge(0.9), size=.3, width=.2)

The result is shown in the following plot:

Going back to the VADeaths example, we could also plot a Cleveland dot plot (dot chart) as follows:

> dotchart(VADeaths,xlim=c(0, 75), xlab=Deaths per 1000, main="Death rates in VA")

Note

Note that the built-in dotchart() function requires that the data be stored as a vector or matrix.

The result is shown in the following plot:

The following are some other graphics you can generate with built-in R functions:

You can generate pie charts with the pie() function as follows:

> labels <- c("grp_A", "grp_B", "grp_C")
> pie_groups <- c(12, 26, 62) 
> pie(pie_groups, labels, col=c("white", "black", "grey")) #Fig. 3B

You can generate box-and-whisker plots with the boxplot() function as follows:

> boxplot(value ~ variable, data= geneExpdata.long, subset=gene == "MYC", ylab="expression value", main="MYC Expression by Condition", cex.lab=1.5, cex.main=1.5)

Note

Note that unlike other built-in R graphing functions, the boxplot() function takes data frames as the input.

Using our cell line drug treatment experiment, we can graph MYC expression for all cell lines by condition. The result is shown in the following plot:

The following is another example using the iris dataset to plot Petal.Width by Species:

> boxplot(Petal.Width ~ Species, data=iris, ylab="petal width", cex.lab=1.5, cex.main=1.5)

The result is shown in the following plot: