Book Image

Mastering Social Media Mining with Python

By : Marco Bonzanini
Book Image

Mastering Social Media Mining with Python

By: Marco Bonzanini

Overview of this book

Your social media is filled with a wealth of hidden data – unlock it with the power of Python. Transform your understanding of your clients and customers when you use Python to solve the problems of understanding consumer behavior and turning raw data into actionable customer insights. This book will help you acquire and analyze data from leading social media sites. It will show you how to employ scientific Python tools to mine popular social websites such as Facebook, Twitter, Quora, and more. Explore the Python libraries used for social media mining, and get the tips, tricks, and insider insight you need to make the most of them. Discover how to develop data mining tools that use a social media API, and how to create your own data analysis projects using Python for clear insight from your social data.
Table of Contents (15 chapters)
Mastering Social Media Mining with Python
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface

Working with Stack Exchange data dumps


The Stack Exchange network also provides complete dumps of their data, available for download through the Internet Archive (https://archive.org/details/stackexchange). The data is available in 7Z, a compressed data format with a high-compression ratio (http://www.7-zip.org). In order to read and extract this format, the 7-zip utility for Windows, or one of its ports for Linux/Unix and macOS, must be downloaded.

At the time of writing, the data dumps for Stack Overflow are provided as separate compressed files, with each file representing an entity or table in their dataset. For example, the stackoverflow.com-Posts.7z file contains the dump for the Posts table (that is, questions and answers). The size of the first version of this file published in 2016 is about 7.9 GB, which when uncompressed yields a file of 39 GB (approximatively five times bigger than the compressed version). All the other Stack Exchange websites have a much smaller data dump, and...