There are a wide variety of social sites that produce datasets. In this example, we will gather one of the datasets and produce a histogram from the data. The specific dataset is the voting behavior on WIKI from https://snap.stanford.edu/data/wiki-Vote.html. Each data item shows user number N
voted for user number X
. So, we produce some statistics in a histogram to analyze voting behavior by:
- Gathering all of the voting that took place
- For each vote:
- Increment a counter that says who voted
- Increment a counter that says who was voted for
- Massage the data so we can display it in two histograms
The coding is as follows:
%matplotlib inline # import all packages being used import matplotlib.pyplot as plt import pandas as pd import numpy as np import matplotlib # load voting data drawn from https://snap.stanford.edu/data/wiki-Vote.html df = pd.read_table('wiki-Vote.txt', sep=r"\s+", index_col=0) # produce standard summary info to validate print(df.head()) print(df.describe...