Book Image

Mastering Social Media Mining with Python

By : Marco Bonzanini
Book Image

Mastering Social Media Mining with Python

By: Marco Bonzanini

Overview of this book

Your social media is filled with a wealth of hidden data – unlock it with the power of Python. Transform your understanding of your clients and customers when you use Python to solve the problems of understanding consumer behavior and turning raw data into actionable customer insights. This book will help you acquire and analyze data from leading social media sites. It will show you how to employ scientific Python tools to mine popular social websites such as Facebook, Twitter, Quora, and more. Explore the Python libraries used for social media mining, and get the tips, tricks, and insider insight you need to make the most of them. Discover how to develop data mining tools that use a social media API, and how to create your own data analysis projects using Python for clear insight from your social data.
Table of Contents (15 chapters)
Mastering Social Media Mining with Python
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface

Text analysis and TF-IDF on notes


After discussing how to download a list of notes and activities for a given page or user, we will shift our focus to the textual analysis of the content.

For each post published by a given user, we want to extract the most interesting keywords, which could be used to summarize the post itself.

While this is intuitively a simple exercise, there are a few subtleties to consider. On the practical side, we can easily observe that the content of each post is not always a clean piece of text, in fact, HTML tags can be included in the content. Before we can carry out our computation, we need to extract the clean text. While the JSON object returned by the Google+ API has a clear structure, the content itself is not necessarily a well-formed structured document. Fortunately, there's a nice Python package that comes to the rescue. Beautiful Soup is, in fact, able to parse HTML and XML documents, including malformed markup. It is compatible with Python 3 and can be...