Book Image

Python Social Media Analytics

By : Baihaqi Siregar, Siddhartha Chatterjee, Michal Krystyanczuk
Book Image

Python Social Media Analytics

By: Baihaqi Siregar, Siddhartha Chatterjee, Michal Krystyanczuk

Overview of this book

Social Media platforms such as Facebook, Twitter, Forums, Pinterest, and YouTube have become part of everyday life in a big way. However, these complex and noisy data streams pose a potent challenge to everyone when it comes to harnessing them properly and benefiting from them. This book will introduce you to the concept of social media analytics, and how you can leverage its capabilities to empower your business. Right from acquiring data from various social networking sources such as Twitter, Facebook, YouTube, Pinterest, and social forums, you will see how to clean data and make it ready for analytical operations using various Python APIs. This book explains how to structure the clean data obtained and store in MongoDB using PyMongo. You will also perform web scraping and visualize data using Scrappy and Beautifulsoup. Finally, you will be introduced to different techniques to perform analytics at scale for your social data on the cloud, using Python and Spark. By the end of this book, you will be able to utilize the power of Python to gain valuable insights from social media data and use them to enhance your business processes.
Table of Contents (17 chapters)
Title Page
Credits
About the Authors
Acknowledgments
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

Summary


Harnessing social data is of vital importance for any worthwhile application. Public data from social media APIs is messy, noisy, and voluminous, and requires a precise and smart strategy to keep the surface away from the noise. The first step in harnessing social data is to collect it by following the steps to connect it to various RESTful APIs and following authentication techniques. Each social network has variations of its API but the basic rules of app creation and authentication remain a common method. Once we successfully make connection to an API we need to parse the JSON data that is collected. The data arriving at the programmers end through the APIs need to be cleaned through basic text mining such as tokenization, duplicate removal, and normalization techniques. Social media data is often unstructured and in various formats, so traditional relational databases are not suitable for these use cases. Finally, we need a flexible and scalable system to stock thousands of social...