A large amount of data available is in the form of text, and it is unstructured, massive, and of tremendous variety. In this chapter, we will have a look at the tools available in R to extract useful information from text.
This chapter describes different ways of mining text. We will cover the following topics:
Examining the text in various ways
Converting text to lowercase
Removing punctuation
Removing numbers
Removing URLs
Removing stop words
Using the stems of words rather than instances
Building a document matrix delineating uses
XML processing, both orthogonal and of varying degrees
Examples