Book Image

Data Manipulation with R - Second Edition

By : Jaynal Abedin, Kishor Kumar Das
Book Image

Data Manipulation with R - Second Edition

By: Jaynal Abedin, Kishor Kumar Das

Overview of this book

<p>This book starts with the installation of R and how to go about using R and its libraries. We then discuss the mode of R objects and its classes and then highlight different R data types with their basic operations.</p> <p>The primary focus on group-wise data manipulation with the split-apply-combine strategy has been explained with specific examples. The book also contains coverage of some specific libraries such as lubridate, reshape2, plyr, dplyr, stringr, and sqldf. You will not only learn about group-wise data manipulation, but also learn how to efficiently handle date, string, and factor variables along with different layouts of datasets using the reshape2 package.</p> <p>By the end of this book, you will have learned about text manipulation using stringr, how to extract data from twitter using twitteR library, how to clean raw data, and how to structure your raw data for data mining.</p>
Table of Contents (13 chapters)
Data Manipulation with R Second Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Text data and its source


Text data is any type of text on any topic. Here is a list of text data and its sources:

  • Tweets from any individual, or from any company

  • Facebook status updates

  • RSS feeds from any news site

  • Blog articles

  • Journal articles

  • Newspapers

  • Verbatim transcripts of an in-depth interview

These are the most common sources of text data. In the area of text analytics, Twitter data has been used frequently to find topic trends through topic modeling. Text data has also been used to predict certain diseases from tweets. The HTML web file are also a great source of text data.

Getting text data

Text data can be embedded into any dataset as a string variable. Also, text data can be stored as plain text files even in the HTML file format. In this section, we will see how we can read or import text data into the R environment for further processing.

The easiest way to get text data is to import from a .csv file where some of the variables contain character data. For example, the tweets.csv file...