Book Image

Learning RStudio for R Statistical Computing

Book Image

Learning RStudio for R Statistical Computing

Overview of this book

Data is coming at us faster, dirtier, and at an ever increasing rate. The necessity to handle many, complex statistical analysis projects is hitting statisticians and analysts across the globe. This book will show you how to deal with it like never before, thus providing an edge and improving productivity. "Learning RStudio for R Statistical Computing" will teach you how to quickly and efficiently create and manage statistical analysis projects, import data, develop R scripts, and generate reports and graphics. R developers will learn about package development, coding principles, and version control with RStudio. This book will help you to learn and understand RStudio features to effectively perform statistical analysis and reporting, code editing, and R development. The book starts with a quick introduction where you will learn to load data, perform simple analysis, plot a graph, and generate automatic reports. You will then be able to explore the available features for effective coding, graphical analysis, R project management, report generation, and even project management. "Learning RStudio for R Statistical Computing" is stuffed with feature-rich and easy-to-understand examples, through step-by-step instructions helping you to quickly master the most popular IDE for R development.
Table of Contents (13 chapters)

Overview: A first R session


Now we have R and Rstudio installed we can start our first R session from within RStudio. It is a good practice to use an RStudio project for all your data analysis with R, for reasons we will encounter later in this book.

We create an R project using the menu Project | New Project. Choose New Directory and name the project file Abalone.

Note

In this session, we download and manipulate the abalone file. This file will be used in examples throughout the book.

Abalones are a very common type of edible sea snail (sometimes called sea ear) occurring in waters around the world. The data in the file used in this book was compiled and published by Warwick J. Nash, Tracy L. Sellers, Simon R. Talbot, Andrew J. Cawthorn, and Wes B. Ford in 1994 [Sea fisheries division Technical Report No. 48 (ISSN 1034-3288)]. It was generously donated to the UCI machine learning repository in 1995.

If you are a beginner in R programming, the RStudio menus facilitate many R commands. When you click on a menu item, RStudio generates and executes the corresponding R commands in the console window. It is a good (and a reproducible!) practice to put your R code in script files as much as possible; but for now we will use some menu commands.

Select Workspace | Import DataSet | From Web URL.

RStudio (and R) can import text files from the disk and over the Internet as well, as shown in the following example:

Type (or paste) the following URL: http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data.

RStudio downloads the file and shows the Import Dataset dialog:

The top left-hand side shows the name (abalone) of the resulting data.frame. On the bottom left-hand side are the settings for reading the data file that RStudio deduced from the data file. You can alter these; however, in this example they are fine. On the top right-hand side RStudio shows the first 25 lines of the data file. On the bottom right-hand side it shows the first 25 records of the resulting data.frame. Click on the Import button.

RStudio imports the data and creates a data.frame with the name abalone using the R command read.table and the options that you have set in the Import DataSet dialog. Also, it automatically runs View(abalone), which shows the data we just imported. Notice that the Workspace panel on the right-hand side now contains the variable abalone. Also, notice that the column names of the data are missing, so we need to add them.

In the console panel we type the following:

names(abalone) <- c("Sex","Length","Diameter","Height","Whole weight"
                    ,"Shucked weight","Viscera weight","Shell weight"
                    ,"Rings")
write.csv(abalone, "abalone.csv", row.names=FALSE)

This sets the correct names for the data set and stores the data in your project directory, so you don't have to download it again. This data file is part of your compendium.

We will start our first data analysis within RStudio with an R script.

Follow the next few steps in order to start the data analysis:

  1. Create a new R script by navigating to File | New | R script (Ctrl+Shift+N or Command+Shift+N) and type the following:

    abalone <- read.csv("abalone.csv")
    table(abalone$Sex)
    plot(Length ~ Sex, data=abalone)

    These commands load the data, calculate the gender frequencies in the data, and plot a box plot of Length by Sex for abalone.

  2. Save your R script as abalone.R using File | Save (Ctrl+S or Command+S).

  3. Execute your R script with Ctrl+Shift+Enter or Command+Shift+Return.

Et voila! We have run a small R script from within RStudio. Notice that the panel on the bottom right-hand side shows the plot that we have created.

But we can do better than that. If you did not follow the previous instructions to install knitr, now is the time to do it after all. You may also install it by typing install.packages("knitr") in the console.

  1. Choose File | Compile Notebook.

  2. Close the Abalone project with Project | Close Project. Choose Save.

    We have now a new empty RStudio session.

  3. Open your newly created an Abalone project by navigating to Project | Recent Projects | Abalone.

Your environment is restored, including all the commands that you typed, thanks to R and RStudio.

Keyboard shortcuts

Besides the standard keyboard shortcuts that you likely use in everyday computer use (cut-copy-paste, or to undo an activity), RStudio supports many keyboard shortcuts specifically for R code editing, execution, and more. Although you are unlikely to learn or use all of them, it is useful to get used to at least a few. We will highlight a few of the most useful keyboard shortcuts in every chapter.

Panel

Windows & Linux

Mac

Description

Source, console

Tab or Ctrl+space bar

Tab or Command+space bar

Command completion.

Source

Ctrl+Enter

Command+Return

Run current line or selection.

Source

Ctrl+Shift+Enter

Command+Shift+Return

Source with echo (run whole file).

Any

Ctrl+1

Command+1

Move cursor to source editor.

Any

Ctrl+2

Command+2

Move cursor to console.

Getting help

If you run into trouble with RStudio, there are several ways to get help online.

  • The developers of RStudio have shown to be amazingly responsive on the help forum at http://support.rstudio.org/. There are many people using R and RStudio, so chances are that someone has already posted the same question somewhere and had it answered. So, before posting a question, make sure to take a look at the troubleshooting guide at RStudio's support page.

  • Search whether your question has been answered before in the FAQs or the forum.

  • Google your question. It may have been answered on another Q&A forum, such as stack exchange.

When you post a question, it helps a lot to include a small example that reproduces your problem. Also, you may want to attach the output of R's sessionInfo() command to show in what context the problem occurred. Finally, it can be helpful if you attach RStudio's logfile. You can find the folder where it is stored by opening Help>Diagnostics>Show log files. If RStudio fails to start, you can find it in the following place folder:

Operating systems

Folder paths

Windows XP

%USERPROFILE%\Local Settings\Application Data\RStudio-Desktop\log

Windows Vista, 7

%localappdata%\RStudio-Desktop\log

Linux, Max OS x

~/.rstudio-desktop/log/

What if I uninstall RStudio?

Although you may find this hard to believe, this is absolutely no problem. Each RStudio project is just a folder, containing your scripts, reports, and data in their original form. Additionally there is a .proj file that holds some session information for RStudio and possibly an .Rdata file. So even if you wish to uninstall RStudio, your work is as accessible as before. You can still re-open your last-closed R session by starting the default Rgui and opening the .Rdata file in that folder. Scripts are stored as simple text files.

It is important to note that RStudio does not alter the storage format of your data in any way. In contrast, many proprietary products force you to import your data and store it in some binary format that cannot be opened with other products.

Further reading

The paper Statistical Analyses and Reproducible Research by Robert Gentleman and Duncan Temple Lang offers a thorough description of methods for reproducible research. It can be downloaded for free from http://biostats.bepress.com/bioconductor/paper2/. There are many books for learning about R, a lot of which are dedicated to specific subjects. Two recent books that discuss R in general that have quickly gained popularity are R in a Nutshell by Joseph Adler, 2010, O'Reilley, and The Art of R programming by Norman Matloff, 2011, No Starch Press, Inc. The former book discusses R as a language as well as many statistical features while the latter thoroughly discusses R as a programming language. Two books focusing on general statistics with R are worth mentioning here as well. The first is Introductory Statistics with R (2nd ed. 2008, Springer) by Peter Dalgaard. The second is Introductory Probability and Statistics Using R by G. Jay Kerns. The latter book is developed as an open source project and can be downloaded from http://ipsur.org/.

To keep up-to-date information on what happens in the R community, we highly recommend frequent visits to Tal Galili's r-bloggers.com. This website collects a large amount of R related blogs in a convenient newspaper-like layout. Subscribing with an RSS reader for smartphone or PC is also possible.

Summary

In this chapter we emphasized the importance of making your analyses reproducible and introduced the concepts of reproducible research and the compendium. How to install R and RStudio in several environments was shown. RStudio supports the concept of a compendium through projects, and if you followed the first session carefully, you have learned to read, alter, and store a simple CSV file, perform some simple analyses, and make a simple plot and generate an HTML report automatically that you can share with your coworkers.

In the next chapter we will take a deeper dive into writing scripts with RStudio.