Book Image

Mastering Text Mining with R

By : KUMAR ASHISH
Book Image

Mastering Text Mining with R

By: KUMAR ASHISH

Overview of this book

Text Mining (or text data mining or text analytics) is the process of extracting useful and high-quality information from text by devising patterns and trends. R provides an extensive ecosystem to mine text through its many frameworks and packages. Starting with basic information about the statistics concepts used in text mining, this book will teach you how to access, cleanse, and process text using the R language and will equip you with the tools and the associated knowledge about different tagging, chunking, and entailment approaches and their usage in natural language processing. Moving on, this book will teach you different dimensionality reduction techniques and their implementation in R. Next, we will cover pattern recognition in text data utilizing classification mechanisms, perform entity recognition, and develop an ontology learning framework. By the end of the book, you will develop a practical application from the concepts learned, and will understand how text mining can be leveraged to analyze the massively available data on social media.
Table of Contents (15 chapters)

Chapter 1. Statistical Linguistics with R

Statistics plays an important role in the fields that deal with quantitative data. Computational linguistics is no exception. The quantitative investigation of linguistic data helps us understand the latent patterns that have helped phoneticians, psycholinguistics, linguistics, and many others to explore and understand language.

In this chapter, we will explain the basic terms associated with probability, used in computational linguistics. You will soon get to dive into linguistics and learn about language models and practical quantitative methods used in linguistics.

At the end of this chapter, we will extensively discuss some very useful and highly efficient packages in R, which we will use throughout this book, and by the time you finish the book, you should be able to pick appropriate R packages and functions for specific text-mining activities and be able to effectively use them for practical purposes.

In this chapter, we will cover the following topics:

  • Basic statistics and probability

  • Probabilistic linguistics

  • Language models

  • Quantitative methods in linguistics

  • R packages for text mining