Book Image

Data Manipulation with R - Second Edition

By : Jaynal Abedin, Kishor Kumar Das
Book Image

Data Manipulation with R - Second Edition

By: Jaynal Abedin, Kishor Kumar Das

Overview of this book

<p>This book starts with the installation of R and how to go about using R and its libraries. We then discuss the mode of R objects and its classes and then highlight different R data types with their basic operations.</p> <p>The primary focus on group-wise data manipulation with the split-apply-combine strategy has been explained with specific examples. The book also contains coverage of some specific libraries such as lubridate, reshape2, plyr, dplyr, stringr, and sqldf. You will not only learn about group-wise data manipulation, but also learn how to efficiently handle date, string, and factor variables along with different layouts of datasets using the reshape2 package.</p> <p>By the end of this book, you will have learned about text manipulation using stringr, how to extract data from twitter using twitteR library, how to clean raw data, and how to structure your raw data for data mining.</p>
Table of Contents (13 chapters)
Data Manipulation with R Second Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Installing and using R libraries


R comes with a number of default packages, a collection of previously programmed functions for specific tasks, and with datasets. This is usually known as a library, but the R community refers to it as a package. There are two types of R packages:

  • Default packages that come with the R executable

  • Add-on packages that do not come during installation; we need to install them manually on downloading

When we open the R console, it automatically loads its default packages with the associated functions, and we do not need to load those packages manually. A list of installed packages can be obtained by typing library() in the R console. However, some of the packages need to load to execute functions. To load a specific package, the corresponding R command is library(package), where package is the name of any library such as plyr, provided that the package has already been installed.

In some situations, we may require a special type of data processing and analysis. If the corresponding packages are not available in the default list, we need to install them. For example, the plyr package is not in the default list, so we need to install it separately.

There are two different ways to install a package:

  • By manually downloading and installing it

  • Installing it from within R

Manually downloading and installing packages

To download a package from CRAN and install it, follow these steps:

  1. Go to http://www.r-project.org/.

  2. Click on CRAN mirror under the Getting Started section.

  3. Select any one of the regional servers from the list; for example, select the server from Austria at http://cran.at.r-project.org/.

  4. Click on Contributed extension packages under the Source Code for all Platforms section.

  5. Select Table of available packages, sorted by date of publication or Table of available packages, sorted by name and then download the desired package from the list.

  6. While downloading, users need to choose the file that matches with the platform; for example, a Windows user will download the binary zip file.

  7. Once the download is completed, open R.

  8. Go to the Packages menu and select Install packages from local zip files.

Tip

One potential problem with manual downloads is that, sometimes, particular packages are dependent on other packages that are not included in the manual process of installation. To avoid this problem, we can install the desired package(s) from the R shell, as installing package(s) from the R shell resolves dependencies.

Installing packages within the R shell

To install a package from within the R console, we can use the install.packages() command; this command will prompt us to select the appropriate server CRAN. Note that to install packages using this approach, the computer must have active Internet connection.

For example, to install the plyr package, we can use the following command:

install.packages("plyr")

The previous command will prompt us to select a regional server and, after selecting the server from the available list, the package will be installed on the local computer.