Straight from the R project, R is a language and environment for statistical computing and graphics, and it has emerged as one of the de-facto languages for statistical and data analysis. For us, it will be the default tool that we use in the first half of the book.
Getting ready Make sure you have a good broadband connection to the Internet as you may have to download up to 200 MB of software.
Installing R is easy; use the following steps:
- Go to Comprehensive R Archive Network (CRAN) and download the latest release of R for your particular operating system:
- For Windows, go to http://cran.r-project.org/bin/windows/base/
- For Linux, go to http://cran.us.r-project.org/bin/linux/
- For Mac OS X, go to http://cran.us.r-project.org/bin/macosx/
As of June 2017, the latest release of R is Version 3.4.0 from April 2017.
- Once downloaded, follow the excellent instructions provided by CRAN to install the software on your respective platform. For both Windows and Mac, just double-click on the downloaded install packages.
- With R installed, go ahead and launch it. You should see a window similar to that shown in the following screenshot:
- An important modification of CRAN is available at https://mran.microsoft.com/ and it is a Microsoft contribution to R software. In fact, the authors are a fan of this variant and strongly recommend the Microsoft version as it has been demonstrated on multiple occasions that MRAN version is much faster than the CRAN version and all codes run the same on both the variants. So, there is a bonus reason to use MRAN R versions.
- You can stop at just downloading R, but you will miss out on the excellent Integrated Development Environment (IDE) built for R, called RStudio. Visit http://www.rstudio.com/ide/download/ to download RStudio, and follow the online installation instructions.
- Once installed, go ahead and run RStudio. The following screenshot shows one of our author's customized RStudio configurations with the
Console
panel in the upper-left corner, the editor in the upper-right corner, the current variable list in the lower-left corner, and the current directory in the lower-right corner:
R is an interpreted language that appeared in 1993 and is an implementation of the S statistical programming language that emerged from Bell Labs in the '70s (S-PLUS is a commercial implementation of S). R, sometimes referred to as GNU S due to its open source license, is a domain-specific language (DSL) focused on statistical analysis and visualization. While you can do many things with R, not seemingly related directly to statistical analysis (including web scraping), it is still a domain-specific language and not intended for general-purpose usage.
R is also supported by CRAN, the Comprehensive R Archive Network ( http://cran.r-project.org/ ). CRAN contains an accessible archive of previous versions of R, allowing for analyses depending on older versions of the software to be reproduced. Further, CRAN contains hundreds of freely downloaded software packages, greatly extending the capability of R. In fact, R has become the default development platform for multiple academic fields, including statistics, resulting in the latest and greatest statistical algorithms being implemented first in R. The faster R versions are available in the Microsoft variants at https://mran.microsoft.com/.
RStudio ( http://www.rstudio.com/ ) is available under the GNU Affero General Public License v3 and is open source and free to use. RStudio, Inc., the company, offers additional tools and services for R as well as commercial support.
You can also refer to the following:
- Refer to the Getting Started with Rarticle at https://support.rstudio.com/hc/en-us/articles/201141096-Getting-Started-with-R
- Visit the home page for RStudio at http://www.rstudio.com/
- Refer to the Stages in the Evolution of S article at http://cm.bell-labs.com/cm/ms/departments/sia/S/history.html
- Refer to the A Brief History of S PS file at http://cm.bell-labs.com/stat/doc/94.11.ps