Book Image

Data Manipulation with R - Second Edition

By : Jaynal Abedin, Kishor Kumar Das
Book Image

Data Manipulation with R - Second Edition

By: Jaynal Abedin, Kishor Kumar Das

Overview of this book

<p>This book starts with the installation of R and how to go about using R and its libraries. We then discuss the mode of R objects and its classes and then highlight different R data types with their basic operations.</p> <p>The primary focus on group-wise data manipulation with the split-apply-combine strategy has been explained with specific examples. The book also contains coverage of some specific libraries such as lubridate, reshape2, plyr, dplyr, stringr, and sqldf. You will not only learn about group-wise data manipulation, but also learn how to efficiently handle date, string, and factor variables along with different layouts of datasets using the reshape2 package.</p> <p>By the end of this book, you will have learned about text manipulation using stringr, how to extract data from twitter using twitteR library, how to clean raw data, and how to structure your raw data for data mining.</p>
Table of Contents (13 chapters)
Data Manipulation with R Second Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Date processing using lubridate


R can handle date variables in several ways. There are built-in R functions available to process date variables, and there are also some useful contributed packages available. The built-in R function as.Date() can handle only dates but not time, whereas the chron package, contributed by James and Hornik in 2008, can handle both date and time. However, it cannot work with time zones. Using the POSIXct and POSIXlt class objects, we can work with time zones. But there is another R package, lubridate, contributed by Grolemund and Wickham in 2011, that has a much more user friendly functionality to process date and time, with time zone support. In this section, we will see how we can easily process date and time using the lubridate package, and compare it with built-in R functions.

Like other statistical software, R also has a base date, and using that base date, R internally stores date objects. In R, dates are stored as the number of days elapsed since January...