Book Image

Learning RStudio for R Statistical Computing

Book Image

Learning RStudio for R Statistical Computing

Overview of this book

Data is coming at us faster, dirtier, and at an ever increasing rate. The necessity to handle many, complex statistical analysis projects is hitting statisticians and analysts across the globe. This book will show you how to deal with it like never before, thus providing an edge and improving productivity. "Learning RStudio for R Statistical Computing" will teach you how to quickly and efficiently create and manage statistical analysis projects, import data, develop R scripts, and generate reports and graphics. R developers will learn about package development, coding principles, and version control with RStudio. This book will help you to learn and understand RStudio features to effectively perform statistical analysis and reporting, code editing, and R development. The book starts with a quick introduction where you will learn to load data, perform simple analysis, plot a graph, and generate automatic reports. You will then be able to explore the available features for effective coding, graphical analysis, R project management, report generation, and even project management. "Learning RStudio for R Statistical Computing" is stuffed with feature-rich and easy-to-understand examples, through step-by-step instructions helping you to quickly master the most popular IDE for R development.
Table of Contents (13 chapters)

Chapter 1. Getting Started

This chapter shows how to obtain R and RStudio. An introduction to the concepts of reproducible research will be given. We will first show a simple RStudio session that already results in a simple, fully reproducible report. If you have ever had to analyze data for work, study, or a research project you'd have probably run into a situation where you ended up with a messy kludge of temporary files, scripts, and intermediate results that are almost impossible to untangle. If this sounds familiar, you probably also had to rewrite pieces of your report while debugging your analyses, or when receiving updates of your data sets. Re-running calculations, and re-inserting figures, tables, and results can take a lot of time. Moreover, as a project turns more and more into a spaghetti of files and folders, reproducing exactly what you did becomes harder and harder. Needless to say, things can become even more difficult when collaborating with a number of people on such projects.

RStudio™ is a free and open source tool that makes it easier for you to do the following:

  • Work with R and R's graphics interactively

  • Organize your code and maintain multiple projects

  • Make your research reproducible

  • Maintain the packages in your R installation

  • Create and share your reports

  • Share your code and collaborate with other users

RStudio runs on all the major operating systems, including Windows, Linux, and Mac OS X. Additionally, it can be used to run R on a remote web server. In that case, RStudio's interface will run in your browser.

This book is aimed at beginning and moderate R users who want to get the most out of R and RStudio. In the coming chapters we will cover most of RStudio's features, and emphasize some best practices in statistical data analyses. A few words about R: R is a free software tool for statistical analyses comprised of the R programming language and the R environment. Here, free means not only free of charge (as in free beer) but also free as in freedom. That is, you are allowed to download and use R, inspect or alter its source code, and redistribute it as you like. Note that this freedom is in fact a requirement to perform truly reproducible research, as it allows one, in principle, to check exactly how data is processed in a certain project, down to R's source code itself.

R is distributed via the Comprehensive R Archive Network, a network of servers around the world from where you can download R and its extension packages. You can access it via www.r-project.org. There are a few other sites offering extension package repositories; the most noteworthy are bioconductor (www.bioconductor.org) and the Omega project for statistical computing(www.omegahat.org).

The R environment is a so-called repl , which stands for a read-evaluate-print loop. That is, it offers a text-based interface where you can enter R commands. After a command is entered, the R engine processes it (evaluation) and possibly prints a result to the screen. Alternatively (and more commonly), the commands can be stored in a text file to be run by R.

Users who are accustomed to point-and-click interfaces for using statistical functionality may find the first encounter with such an interface daunting, and to be honest, the learning curve for R can be steep at times. However, in order to make work reproducible, it is unavoidable to store the steps of your analyses as source code. Moreover, being a true programming language makes R a much more versatile and powerful tool than any point-and-click software that only offers a predefined functionality.

Fortunately for us, writing code is nothing new and over the past decades, many good ideas have been developed in the software industry to make coding and code management a lot easier. RStudio implements many of those ideas for R users. Important tips for your maintaining of your R installation are mentioned as follows:

  • Always use the latest, stable version. This is the version likely to have the least bugs in the older functionality. You can read about the latest features by reading the news file, for example by running View(news()) from the R command line. See the Installing R section for an easier way to install R.

  • Frequently update your installed packages. This is simply done by running the update.packages() command from your R console.