Book Image

Data Manipulation with R - Second Edition

By : Jaynal Abedin, Kishor Kumar Das
Book Image

Data Manipulation with R - Second Edition

By: Jaynal Abedin, Kishor Kumar Das

Overview of this book

<p>This book starts with the installation of R and how to go about using R and its libraries. We then discuss the mode of R objects and its classes and then highlight different R data types with their basic operations.</p> <p>The primary focus on group-wise data manipulation with the split-apply-combine strategy has been explained with specific examples. The book also contains coverage of some specific libraries such as lubridate, reshape2, plyr, dplyr, stringr, and sqldf. You will not only learn about group-wise data manipulation, but also learn how to efficiently handle date, string, and factor variables along with different layouts of datasets using the reshape2 package.</p> <p>By the end of this book, you will have learned about text manipulation using stringr, how to extract data from twitter using twitteR library, how to clean raw data, and how to structure your raw data for data mining.</p>
Table of Contents (13 chapters)
Data Manipulation with R Second Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Summary


In this chapter, we discussed the importance of the split-apply-combine strategy. We understood what the split-apply-combine strategy is and why it is important in data manipulations. The split-apply-combine strategy can be implemented using base R, but it requires a large amount of code and is not memory or time efficient. To overcome this limitation, we discussed the plyr package in which group-wise data manipulation can be implemented efficiently. The functions within plyr are intuitive and instructive in terms of input and output types. A large variety of data processing can be done using only a few functions with common input and various types of output. For further reading, an interested user can refer to the paper The Split-Apply-Combine Strategy for Data Analysis by Wickham, which can be found at http://www.jstatsoft.org/v40/i01/paper. We also discussed how we can use dplyr as a powerful tool to manipulate data frame.

In the following chapter, you will learn about reshaping...