Text data is any type of text on any topic. Here is a list of text data and its sources:
Tweets from any individual, or from any company
Facebook status updates
RSS feeds from any news site
Blog articles
Journal articles
Newspapers
Verbatim transcripts of an in-depth interview
These are the most common sources of text data. In the area of text analytics, Twitter data has been used frequently to find topic trends through topic modeling. Text data has also been used to predict certain diseases from tweets. The HTML web file are also a great source of text data.
Text data can be embedded into any dataset as a string variable. Also, text data can be stored as plain text files even in the HTML file format. In this section, we will see how we can read or import text data into the R environment for further processing.
The easiest way to get text data is to import from a .csv
file where some of the variables contain character data. For example, the tweets.csv
file...