This section is all about analyzing entities in tweets. We're going to perform some frequency analysis using the data collected in the previous section. Slicing and dicing this data will allow users to produce some interesting statistics that can be used to get some insights on the data and answer some questions.
Analyzing entities such as hashtags is interesting as these annotations are an explicit way for the author to label the topic of the tweet.
We start with the analysis of the tweets by Packt Publishing. As Packt Publishing supports and promotes open source software, we are interested in finding what kind of technologies are mentioned often by Packt Publishing.
The following script extracts the hashtags from a user timeline, producing a list of the most common ones:
# Chap02-03/twitter_hashtag_frequency.py import sys from collections import Counter import json def get_hashtags(tweet): entities = tweet.get('entities', {}) hashtags = entities...