Book Image

Practical Data Science Cookbook, Second Edition - Second Edition

By : Prabhanjan Narayanachar Tattar, Bhushan Purushottam Joshi, Sean Patrick Murphy, ABHIJIT DASGUPTA, Anthony Ojeda
Book Image

Practical Data Science Cookbook, Second Edition - Second Edition

By: Prabhanjan Narayanachar Tattar, Bhushan Purushottam Joshi, Sean Patrick Murphy, ABHIJIT DASGUPTA, Anthony Ojeda

Overview of this book

As increasing amounts of data are generated each year, the need to analyze and create value out of it is more important than ever. Companies that know what to do with their data and how to do it well will have a competitive advantage over companies that don’t. Because of this, there will be an increasing demand for people that possess both the analytical and technical abilities to extract valuable insights from data and create valuable solutions that put those insights to use. Starting with the basics, this book covers how to set up your numerical programming environment, introduces you to the data science pipeline, and guides you through several data projects in a step-by-step format. By sequentially working through the steps in each chapter, you will quickly familiarize yourself with the process and learn how to apply it to a variety of situations with examples using the two most popular programming languages for data analysis—R and Python.
Table of Contents (17 chapters)
Title Page
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Preface

Installing R on Windows, Mac OS X, and Linux


Straight from the R project, R is a language and environment for statistical computing and graphics, and it has emerged as one of the de-facto languages for statistical and data analysis. For us, it will be the default tool that we use in the first half of the book.

Getting ready Make sure you have a good broadband connection to the Internet as you may have to download up to 200 MB of software.

How to do it...

Installing R is easy; use the following steps:

  1. Go to Comprehensive R Archive Network (CRAN) and download the latest release of R for your particular operating system:

As of June 2017, the latest release of R is Version 3.4.0 from April 2017.

  1. Once downloaded, follow the excellent instructions provided by CRAN to install the software on your respective platform. For both Windows and Mac, just double-click on the downloaded install packages.

  1. With R installed, go ahead and launch it. You should see a window similar to that shown in the following screenshot:

  1. An important modification of CRAN is available at https://mran.microsoft.com/ and it is a Microsoft contribution to R software. In fact, the authors are a fan of this variant and strongly recommend the Microsoft version as it has been demonstrated on multiple occasions that MRAN version is much faster than the CRAN version and all codes run the same on both the variants. So, there is a bonus reason to use MRAN R versions.
  2. You can stop at just downloading R, but you will miss out on the excellent Integrated Development Environment (IDE) built for R, called RStudio. Visit http://www.rstudio.com/ide/download/ to download RStudio, and follow the online installation instructions.

  1. Once installed, go ahead and run RStudio. The following screenshot shows one of our author's customized RStudio configurations with the Console panel in the upper-left corner, the editor in the upper-right corner, the current variable list in the lower-left corner, and the current directory in the lower-right corner:

How it works...

R is an interpreted language that appeared in 1993 and is an implementation of the S statistical programming language that emerged from Bell Labs in the '70s (S-PLUS is a commercial implementation of S). R, sometimes referred to as GNU S due to its open source license, is a domain-specific language (DSL) focused on statistical analysis and visualization. While you can do many things with R, not seemingly related directly to statistical analysis (including web scraping), it is still a domain-specific language and not intended for general-purpose usage.

R is also supported by CRAN, the Comprehensive R Archive Network ( http://cran.r-project.org/ ). CRAN contains an accessible archive of previous versions of R, allowing for analyses depending on older versions of the software to be reproduced. Further, CRAN contains hundreds of freely downloaded software packages, greatly extending the capability of R. In fact, R has become the default development platform for multiple academic fields, including statistics, resulting in the latest and greatest statistical algorithms being implemented first in R. The faster R versions are available in the Microsoft variants at https://mran.microsoft.com/.

RStudio ( http://www.rstudio.com/ ) is available under the GNU Affero General Public License v3 and is open source and free to use. RStudio, Inc., the company, offers additional tools and services for R as well as commercial support.

See also

You can also refer to the following: