The data explosion problem
In order to understand the data explosion problem we are talking about, let’s take a realistic use case to bring home the point.
We all know Instagram. It is, in the most simplest of terms, a photo-sharing website. One statistic https://blog.hootsuite.com/instagram-statistics/ claims number of monthly active users on Instagram is 800 million. Now, let's assume that on average, each active user posts a new photo every day. There would be some users who wouldn’t post a new pic every day but there would be users who would post more than one picture every day. So we will even it out to 1 picture per day per user.
Next, if we assume the average size of the picture to be 1 MB, then, on average Instagram generates: 1MB X 800 Million = 800 tera bytes of images alone.
Now this data excludes all the meta information related to images, such as the number of likes, who liked it when, and filters applied on the image. This also excludes any replication of these images for high...