Some of the most commonly used fields of interest in data extraction are:
text
: This is the content of the tweet provided by the useruser
: These are some of the main attributes about the user, such as username, location, and photosPlace
: This is where the tweets are posted, and also the geo coordinatesEntities
: Effectively, these are the hashtags and topics that a user attaches to his / her tweets
Every attribute in the previous figure can be a good use case for some of the social mining exercises done in practice. Let's jump onto the topic of how we can get to these attributes and convert them to a more readable form, or how we can process some of these:
Source: tweetinfo.py
>>>import json >>>import sys >>>tweets = json.loads(open(sys.argv[1]).read()) >>>tweet_texts = [ tweet['text']\ for tweet in tweets ] >>>tweet_source = [tweet ['source'] for tweet in tweets] >>>tweet_geo = [tweet...