Book Image

Python Data Analysis

By : Ivan Idris
Book Image

Python Data Analysis

By: Ivan Idris

Overview of this book

Table of Contents (22 chapters)
Python Data Analysis
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Key Concepts
Online Resources
Index

Chapter 9. Analyzing Textual Data and Social Media

In the previous chapters, we focused on the analysis of structured data, mostly in tabular format. In reality, plain text is the most predominant form of data available today. Text analysis applies analysis of word frequency distributions, pattern recognition, tagging, link and association analysis, sentiment analysis, and visualization. We will analyze text with the Python Natural Language Toolkit (NLTK) library. NLTK comes with a collection of sample texts called corpora. A small example of network analysis will also be covered. The following topics will be discussed in this chapter:

  • Installing NLTK

  • Filtering out stopwords, names, and numbers

  • The bag-of-words model

  • Analyzing word frequencies

  • Naive Bayes classification

  • Sentiment analysis

  • Creating word clouds

  • Social network analysis