Book Image

Mastering Social Media Mining with Python

By : Marco Bonzanini
Book Image

Mastering Social Media Mining with Python

By: Marco Bonzanini

Overview of this book

Your social media is filled with a wealth of hidden data – unlock it with the power of Python. Transform your understanding of your clients and customers when you use Python to solve the problems of understanding consumer behavior and turning raw data into actionable customer insights. This book will help you acquire and analyze data from leading social media sites. It will show you how to employ scientific Python tools to mine popular social websites such as Facebook, Twitter, Quora, and more. Explore the Python libraries used for social media mining, and get the tips, tricks, and insider insight you need to make the most of them. Discover how to develop data mining tools that use a social media API, and how to create your own data analysis projects using Python for clear insight from your social data.
Table of Contents (15 chapters)
Mastering Social Media Mining with Python
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface

Text classification for question tags


This section is about supervised learning. We define the problem of assigning tags to a question as a text classification problem and we apply it to a dataset of questions from Stack Exchange.

Before introducing the details of text classification, let's consider the following question from the Movies & TV Stack Exchange website (title and body of the question have been merged):

"What's the House MD episode where he hired a woman to fake dead to fool the team? I remember a (supposedly dead) woman waking up and giving a high-five to House. Which episode was this from?"

The preceding question asks for details about a particular episode of the popular TV series House, M.D. As described earlier, questions on Stack Exchange are labeled with tags with the purpose of quickly identifying the topic of the question. The tags assigned by the user to this question are house and identify-this-episode, the first one being a reference to the TV series itself, while...