The stories that we have collected here represent approximately the 500 most shared pieces of content over the past year. We're going to try to deconstruct these articles to find the common traits that make them so shareable. We'll begin by looking at the image data.
Let's begin by looking at the number of images that are included with each story. We'll run a value count and then plot the numbers:
dfc['img_count'].value_counts().to_frame('count')
The preceding code generates the following output:
Now, let's plot that same information:
fig, ax = plt.subplots(figsize=(8,6)) y = dfc['img_count'].value_counts().sort_index() x = y.sort_index().index plt.bar(x, y, color='k', align='center') plt.title('Image Count Frequency', fontsize=16, y=1.01) ax.set_xlim(-.5,5.5) ax.set_ylabel('Count') ax.set_xlabel('Number of Images')
The preceding code generates the following output:
Already, the numbers are surprising. The vast majority of stories...