Book Image

Practical Data Science Cookbook

By : Tony Ojeda, Sean Patrick Murphy, Benjamin Bengfort, Abhijit Dasgupta
Book Image

Practical Data Science Cookbook

By: Tony Ojeda, Sean Patrick Murphy, Benjamin Bengfort, Abhijit Dasgupta

Overview of this book

<p>As increasing amounts of data is generated each year, the need to analyze and operationalize it is more important than ever. Companies that know what to do with their data will have a competitive advantage over companies that don't, and this will drive a higher demand for knowledgeable and competent data professionals.</p> <p>Starting with the basics, this book will cover how to set up your numerical programming environment, introduce you to the data science pipeline (an iterative process by which data science projects are completed), and guide you through several data projects in a step-by-step format. By sequentially working through the steps in each chapter, you will quickly familiarize yourself with the process and learn how to apply it to a variety of situations with examples in the two most popular programming languages for data analysis—R and Python.</p>
Table of Contents (18 chapters)
Practical Data Science Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Installing R on Windows, Mac OS X, and Linux


Straight from the R project, "R is a language and environment for statistical computing and graphics". And it has emerged as one of the de facto languages for statistical and data analysis. For us, it will be the default tool that we use in the first half of the book.

Getting ready

Make sure you have a good broadband connection to the Internet as you may have to download up to 200 MB of software.

How to do it...

Installing R is easy; use the following steps:

  1. Go to Comprehensive R Archive Network (CRAN) and download the latest release of R for your particular operating system:

    As of February 2014, the latest release of R is Version 3.0.2 from September 2013.

  2. Once downloaded, follow the excellent instructions provided by CRAN to install the software on your respective platform. For both Windows and Mac, just double-click on the downloaded install packages.

  3. With R installed, go ahead and launch it. You should see a window similar to what is shown in the following screenshot:

  4. You can stop at just downloading R, but you will miss out on the excellent Integrated Development Environment (IDE) built for R, called RStudio. Visit http://www.rstudio.com/ide/download/ to download RStudio, and follow the online installation instructions.

  5. Once installed, go ahead and run RStudio. The following screenshot shows one of our author's customized RStudio configurations with the Console panel in the upper-left corner, the editor in the upper-right corner, the current variable list in the lower-left corner, and the current directory in the lower-right corner.

How it works...

R is an interpreted language that appeared in 1993 and is an implementation of the S statistical programming language that emerged from Bell Labs in the '70s (S-PLUS is a commercial implementation of S). R, sometimes referred to as GNU S due to its open source license, is a domain-specific language (DSL) focused on statistical analysis and visualization. While you can do many things with R, not seemingly related directly to statistical analysis (including web scraping), it is still a domain-specific language and not intended for general-purpose usage.

R is also supported by CRAN, the Comprehensive R Archive Network (http://cran.r-project.org/). CRAN contains an accessible archive of previous versions of R, allowing for analyses depending on older versions of the software to be reproduced. Further, CRAN contains hundreds of freely downloaded software packages greatly extending the capability of R. In fact, R has become the default development platform for multiple academic fields, including statistics, resulting in the latest and greatest statistical algorithms being implemented first in R.

RStudio (http://www.rstudio.com/) is available under the GNU Affero General Public License v3 and is open source and free to use. RStudio, Inc., the company, offers additional tools and services for R as well as commercial support.

See also