Book Image

Python Social Media Analytics

By : Baihaqi Siregar, Siddhartha Chatterjee, Michal Krystyanczuk
Book Image

Python Social Media Analytics

By: Baihaqi Siregar, Siddhartha Chatterjee, Michal Krystyanczuk

Overview of this book

Social Media platforms such as Facebook, Twitter, Forums, Pinterest, and YouTube have become part of everyday life in a big way. However, these complex and noisy data streams pose a potent challenge to everyone when it comes to harnessing them properly and benefiting from them. This book will introduce you to the concept of social media analytics, and how you can leverage its capabilities to empower your business. Right from acquiring data from various social networking sources such as Twitter, Facebook, YouTube, Pinterest, and social forums, you will see how to clean data and make it ready for analytical operations using various Python APIs. This book explains how to structure the clean data obtained and store in MongoDB using PyMongo. You will also perform web scraping and visualize data using Scrappy and Beautifulsoup. Finally, you will be introduced to different techniques to perform analytics at scale for your social data on the cloud, using Python and Spark. By the end of this book, you will be able to utilize the power of Python to gain valuable insights from social media data and use them to enhance your business processes.
Table of Contents (17 chapters)
Title Page
About the Authors
About the Reviewer
Customer Feedback

Delving into social data

The data acquired from social media is called social data. Social data exists in many forms.

The types of social media data can be information around the users of social networks, like name, city, interests, and so on. These types of data that are numeric or quantifiable are known as structured data.

However, since social media are platforms for expression, a lot of the data is in the form of texts, images, videos, and such. These sources are rich in information, but not as direct to analyze as structured data described earlier. These types of data are known as unstructured data.

The process of applying rigorous methods to make sense of the social data is called social data analytics. In the book, we will go into great depth in social data analytics to demonstrate how we can extract valuable sense and information from these really interesting sources of social data. Since there are almost no restrictions on social media, there are lot of meaningless accounts, content, and interactions. So, the data coming out of these streams is quite noisy and polluted. Hence, a lot of effort is required to separate the information from the noise. Once the data is cleaned and we are focused on the most important and interesting aspects, we then require various statistical and algorithmic methods to make sense out of the filtered data and draw meaningful conclusions.

Understanding semantics

A concept important to understand when handling unstructured data is semantics. Dictionaries define the term as the branch of linguistics and logic concerned with meaning.

It is a concept that comes from linguistic science and philosophy, to deal with the study and research of meaning. These meanings are uncovered by understanding the relationship between words, phrases, and all types of symbols. From a social media point of view, symbol could be the popular emoticons, which are not exactly formal language but they signify emotions. These symbols can be extended to images and videos, where patterns in their content can be used to extract meanings. In the later chapters, we will show few techniques that can help you to get meaning out of textual data. Extracting meaning or sense from images and videos is out of scope for the book. Semantic technology is very central to effectively analyzing unstructured social data.

For effectively extracting sense out of social data, semantic technologies have underlying artificial intelligence or machine learning algorithms. These algorithms allow you to find patterns in the data, which are then humanly interpreted. That's why social data analytics is so exciting, as it brings together knowledge from the fields of semantics and machine learning, and then binds it with sociology for business or other objectives.

Defining the semantic web

The growth of the internet has given rise to platforms like websites, portals, search engines, social media, and so on. All of these have created a massive collection of content and documents. Google and other search engines have helped to organize these documents and make them accessible to everyday users. So, today we are able to search our questions and select websites or pages that are linked to the answer. Even social media content is more and more accessible via search engines. You may find a tweet that you created two years back suddenly showing on a Google result. The problem of organization of web content is almost a solved problem. However, wouldn't it be more exciting if you asked a question on Google, Bing, or another search engine and it directly gave you the answer, just like a friend with the required knowledge? This is exactly what the future web would look like and would do. Already, for example, if you put the query on Google about what's the temperature in Paris? or who is the wife of Barack Obama?, it gives you the right answer. The ability of Google to do this is inherently semantic technology with natural language processing and machine learning. Algorithms that Google has behind its search engine creates links between queries and content by understanding relation between words, phrases, and actual answers.

However, today only a fixed number of questions can be answered, as there is big risk of inferring wrong answers on multiple questions. The future of the internet will be an extension to the World Wide Web, which is the semantic web. The term was coined by the creator of the World Wide Web, Tim Berners-Lee. The semantic web is a complex concept on a simple idea of connecting entities (URLs, pages, and content) on the web through relations, but the underlying implementation is difficult at scale, due to the sheer volume of entities present on the internet. New markup languages called Resource Description Framework (RDF) and Web Ontology Language (OWL) will be used to create these links between pages and content, based on relations. These new languages will allow creators of content to add meaning to their documents, which machines could process for reasoning or inference purposes, allowing automating many tasks on the web. Our book is not about explaining the underlying concepts of the semantic web, but just for your knowledge about where the web is heading and to appreciate the needs to mine the web more intelligently, as you'll learn in the later chapters.

Exploring social data applications

Now that you know where the future of the web is heading, let's shift our focus back to our discussion on the purpose of analyzing the social web data. We have discussed about the nature of social media and the social data, structured and unstructured, but you must be curious as to how this is used in the real world. In our view, restricting the application of social data analytics to certain fields or sectors is not entirely fair. Social data analytics leads you to the learning or discovery of facts or knowledge. If acquiring knowledge can't be restricted to a few fields, neither can be social media analytics. However, there are some fields that are prospering more from this science, such as marketing, advertising, and research communities. Social media data is being integrated more and more in existing digital services to provide a much more personalized experience through recommendations. You must have seen that most online services allow you to register using your social profiles along with added information. When you do so, the service is able to mine your social data and recommend products or catalogs aligned with your interests. Entertainment services like Spotify and Netflix, or e-commerce ones like Amazon, eBay, and others, are able to propose personalized recommendations based on social data analytics and other data sources. More traditional companies selling consumer products derive value from social data in their marketing of products and brands. People use social networks as a means to both connect with companies and to express about their products and services. Hence, there is a huge amount of data on the social web that contains customer preferences and complaints about companies. This is an example of unstructured-social data, since it's mostly textual or images in format. Companies are analyzing this data to understand how consumers feel and use their services or campaigns, and then are using this intelligence to integrate it in their marketing and communications.

A similar approach has been applied in political campaigns to understand the opinion of people on various political issues. Analysts and data scientists have gone as far as trying to predict election results using sentiments of people about the concerned politicians. There are certainly many data scientists using social media data to predict the results of Clinton and Trump elections. There have been attempts to predict the stock market using social data but this has not been very successful, as financial data is highly sensitive and volatile and so is social data, and combining the two is still a big challenge.

In the later chapters, you'll see how we can analyze the Facebook page of a brand to understand their relation with their consumers. In Chapter 7, Scraping and Extracting Conversational Topics on Internet Forums about analyzing forums, you'll see how we are able to understand deeper conversations regarding certain subjects. Building recommendation engines is beyond the scope of the book, but you'll know enough about social data in order to integrate it for your recommender system projects.

Now that you know enough about social media, social data, and their applications, we will dive into the methods to get on top of social data. Among the many techniques used to analyze social data, machine learning is one of the most effective ones.