Book Image

Practical Data Science Cookbook

By : Tony Ojeda, Sean Patrick Murphy, Benjamin Bengfort, Abhijit Dasgupta
Book Image

Practical Data Science Cookbook

By: Tony Ojeda, Sean Patrick Murphy, Benjamin Bengfort, Abhijit Dasgupta

Overview of this book

<p>As increasing amounts of data is generated each year, the need to analyze and operationalize it is more important than ever. Companies that know what to do with their data will have a competitive advantage over companies that don't, and this will drive a higher demand for knowledgeable and competent data professionals.</p> <p>Starting with the basics, this book will cover how to set up your numerical programming environment, introduce you to the data science pipeline (an iterative process by which data science projects are completed), and guide you through several data projects in a step-by-step format. By sequentially working through the steps in each chapter, you will quickly familiarize yourself with the process and learn how to apply it to a variety of situations with examples in the two most popular programming languages for data analysis—R and Python.</p>
Table of Contents (18 chapters)
Practical Data Science Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Installing libraries in R and RStudio


R has an incredible number of libraries that add to its capabilities. In fact, R has become the default language for many college and university statistics departments across the country. Thus, R is often the language that will get the first implementation of newly developed statistical algorithms and techniques. Luckily, installing additional libraries is easy, as you will see in the following sections.

Getting ready

As long as you have R or RStudio installed, you should be ready to go.

How to do it...

R makes installing additional packages simple:

  1. Launch the R interactive environment or, preferably, RStudio.

  2. Let's install ggplot2. Type the following command, and then press the Enter key:

    install.packages("ggplot2")
    

    Tip

    Note that for the remainder of the book, it is assumed that when we specify entering a line of text, it is implicitly followed by hitting the Return or Enter key on the keyboard.

  3. You should now see text similar to the following as you scroll down the screen:

    trying URL 'http://cran.rstudio.com/bin/macosx/contrib/3.0/ggplot2_0.9.3.1.tgz'
    Content type 'application/x-gzip' length 2650041 bytes (2.5 Mb)
    opened URL
    ==================================================
    downloaded 2.5 Mb
    
    The downloaded binary packages are in
    /var/folders/db/z54jmrxn4y9bjtv8zn_1zlb00000gn/T//Rtmpw0N1dA/downloaded_packages
    
  4. You might have noticed that you need to know the exact name, in this case, ggplot2, of the package you wish to install. Visit http://cran.us.r-project.org/web/packages/available_packages_by_name.html to make sure you have the correct name.

  5. RStudio provides a simpler mechanism to install packages. Open up RStudio if you haven't already done so.

  6. Go to Tools in the menu bar and select Install Packages …. A new window will pop up, as shown in the following screenshot:

  7. As soon as you start typing in the Packages field, RStudio will show you a list of possible packages. The autocomplete feature of this field simplifies the installation of libraries. Better yet, if there is a similarly named library that is related, or an earlier or newer version of the library with the same first few letters of the name, you will see it.

  8. Let's install a few more packages that we highly recommend. At the R prompt, type the following commands:

    install.packages("lubridate")
    install.packages("plyr")
    install.packages("reshape2")
    

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

How it works...

Whether you use RStudio's graphical interface or the install.packages command, you do the same thing. You tell R to search for the appropriate library built for your particular version of R. When you issue the command, R reports back the URL of the location where it has found a match for the library in CRAN and the location of the binary packages after download.

There's more...

R's community is one of its strengths, and we would be remiss if we didn't briefly mention two things. R-bloggers is a website that aggregates R-related news and tutorials from over 450 different blogs. If you have a few questions on R, this is a great place to look for more information. The Stack Overflow site (http://www.stackoverflow.com) is a great place to ask questions and find answers on R using the tag rstats.

Finally, as your prowess with R grows, you might consider building an R package that others can use. Giving an in-depth tutorial on the library building process is beyond the scope of this book, but keep in mind that community submissions form the heart of the R movement.

See also