Book Image

Mastering Social Media Mining with Python

By : Marco Bonzanini
Book Image

Mastering Social Media Mining with Python

By: Marco Bonzanini

Overview of this book

Your social media is filled with a wealth of hidden data – unlock it with the power of Python. Transform your understanding of your clients and customers when you use Python to solve the problems of understanding consumer behavior and turning raw data into actionable customer insights. This book will help you acquire and analyze data from leading social media sites. It will show you how to employ scientific Python tools to mine popular social websites such as Facebook, Twitter, Quora, and more. Explore the Python libraries used for social media mining, and get the tips, tricks, and insider insight you need to make the most of them. Discover how to develop data mining tools that use a social media API, and how to create your own data analysis projects using Python for clear insight from your social data.
Table of Contents (10 chapters)

Analyzing tweets - text analysis

The previous section analyzed the entity field of a tweet. This provides useful knowledge on the tweet, because these entities are explicitly curated by the author of the tweet. This section will focus on unstructured data instead, that is, the raw text of the tweet. We'll discuss aspects of text analytics such as text preprocessing and normalization and we'll perform some statistical analysis on the tweets. Before digging the details, we'll introduce some terminology.

Tokenization is one of the important steps in the preprocessing phase. Given a stream of text (such as a tweet status), tokenization is the process of breaking this text down into individual units called tokens. In the simplest form, these units are words, but we could also work on a more complex tokenization that deals with phrases, symbols, and so on.

Tokenization sounds like a trivial task, and it's been widely studied by the natural language processing community. Chapter...