Book Image

Data Manipulation with R - Second Edition

By : Jaynal Abedin, Kishor Kumar Das
Book Image

Data Manipulation with R - Second Edition

By: Jaynal Abedin, Kishor Kumar Das

Overview of this book

<p>This book starts with the installation of R and how to go about using R and its libraries. We then discuss the mode of R objects and its classes and then highlight different R data types with their basic operations.</p> <p>The primary focus on group-wise data manipulation with the split-apply-combine strategy has been explained with specific examples. The book also contains coverage of some specific libraries such as lubridate, reshape2, plyr, dplyr, stringr, and sqldf. You will not only learn about group-wise data manipulation, but also learn how to efficiently handle date, string, and factor variables along with different layouts of datasets using the reshape2 package.</p> <p>By the end of this book, you will have learned about text manipulation using stringr, how to extract data from twitter using twitteR library, how to clean raw data, and how to structure your raw data for data mining.</p>
Table of Contents (13 chapters)
Data Manipulation with R Second Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Introducing the plyr and dplyr libraries


We have seen how we can implement the split-apply-combine approach on a data frame using three lines of code. The plyr package helps us to implement the approach in one line. Since R has multiple data structures, we need multiple functions to work on different data structures. R has three main data structures: list, array, and data frames. So, there could be three different types of input, and the output could produce three different types of data structures. There could be 3 x 3 = 9 possible input-output combinations, and for this reason, plyr has 9 functions to incorporate all the input-output combinations. In addition, we have three additional functions that take six different types of input but display only one type of output.

The plyr package works on every type of data structure, whereas the dplyr package is designed to work only on data frames. The dplyr package offers a complete set of functions to perform every kind of data manipulation we...