Book Image

Data Manipulation with R - Second Edition

By : Jaynal Abedin, Kishor Kumar Das
Book Image

Data Manipulation with R - Second Edition

By: Jaynal Abedin, Kishor Kumar Das

Overview of this book

<p>This book starts with the installation of R and how to go about using R and its libraries. We then discuss the mode of R objects and its classes and then highlight different R data types with their basic operations.</p> <p>The primary focus on group-wise data manipulation with the split-apply-combine strategy has been explained with specific examples. The book also contains coverage of some specific libraries such as lubridate, reshape2, plyr, dplyr, stringr, and sqldf. You will not only learn about group-wise data manipulation, but also learn how to efficiently handle date, string, and factor variables along with different layouts of datasets using the reshape2 package.</p> <p>By the end of this book, you will have learned about text manipulation using stringr, how to extract data from twitter using twitteR library, how to clean raw data, and how to structure your raw data for data mining.</p>
Table of Contents (13 chapters)
Data Manipulation with R Second Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Powerful data manipulation with dplyr


Mostly, in real-life situations, we usually start our analysis with a data frame-type structure. What do we do after getting a dataset and what are the basic data-manipulation tasks we usually perform before starting modeling? They are explained here:

  1. We check the validity of a dataset based on conditions.

  2. We sort the dataset based on some variables, in ascending or descending order.

  3. We create new variables based on existing variables.

  4. Finally, we summarize them.

This is a list of tasks we usually perform over full datasets. The dplyr package has all the necessary functions to perform all the tasks listed and some more additional tasks that come in handy in the data-manipulation process. Group-wise operation is also possible using the dplyr package. In the dplyr package, every task is performed using a function that is called a verb. We may need to use multiple verbs on the same data frame. This could force us to write either a very long line or multiple...