Book Image

Social Data Visualization with HTML5 and JavaScript

By : Simon Timms
Book Image

Social Data Visualization with HTML5 and JavaScript

By: Simon Timms

Overview of this book

<p>The increasing adoption of HTML5 opens up a new world of JavaScript-powered visualizations. By harnessing the power of scalable vector graphics (SVGs), you can present even complex data to your users in an easy-to-understand format and improve the user experience by freeing users from the burden of tabular data.</p> <p>Social Data Visualization with HTML5 and JavaScript teaches you how to leverage HTML5 techniques through JavaScript to build visualizations. It also helps to clear up how the often complicated OAuth protocol works to help you unlock a universe of social media data from sites like Twitter, Facebook, and Google+.</p> <p>Social Data Visualization with HTML5 and JavaScript provides you with an introduction to creating an accessible view into the massive amounts of data available in social networks. Developers with some JavaScript experience and a desire to move past creating boring charts and tables will find this book a perfect fit. You will learn how to make use of powerful JavaScript libraries to become not just a programmer, but a data artist.</p> <p>By using OAuth, which is helpfully demystified in one of the book’s chapters, you will be able to unlock the universe of social media data. Through visualizations, you will also tease out trends and relationships which would normally be lost in the noise.</p>
Table of Contents (15 chapters)

There's a lot of data out there


It shouldn't come as a surprise to anybody that the amount of data humans are recording is growing at an amazing rate. Every few years the data storage company EMC produces a report on just how much data is being preserved (http://www.emc.com/collateral/analyst-reports/idc-the-digital-universe-in-2020.pdf). In 2012, it was estimated that between 2005 and 2020 the amount of data stored globally will grow from 130 to 40, 000 exabytes. That works out at 5.2 terabytes for each person on the planet. It is such a staggering amount of information that understanding how much of it exists is difficult. By 2020, it will work out to 11 spindles of 100 DVDs per person. If we switch to Blu-ray discs, which have a capacity of 50 GB, the stack of them required to store all 40, 000 Exabytes would still reach far beyond the orbit of the moon.

The growth in data is inevitable as people put more of their lives online. The adoption of smartphones has turned everybody into a photographer. Instagram, a popular image sharing site, gathers some 40 million photos a day. One wonders how many photos of people's meals the world really needs. In the past few months there has been an explosion of video clip sharing sites such as Vine and Instagram, which generate massive amounts of data. A myriad of devices are being created to extend the reach of smartphones beyond gathering photographic data. The latest generation of smartphones include temperature, humidity, and pressure sensors in addition to the commonplace GPS, gyroscopic, geomagnetic, and acceleration sensors. These allow for recording an accurate representation of the world around the user.

An increase in the number of sensors is not a trend that is limited to smartphones. The price of sensors and radios has reached a tipping point where it is economical to create standalone devices that record and transmit data about the world. There was a time when building an array of temperature sensors that report back to a central device was the realm of large SCADA systems. One of my first jobs was testing a collection of IP-enabled monitoring devices at a refinery. At the time, the network hardware alone was worth millions. That same system can be built for a few hundred dollars now. A trip to a crowdsourcing site such as Kickstarter or Indiegogo will find countless Bluetooth or Wi-Fi enabled sensor devices. These devices may find your lost keys or tell you when to water your tomatoes. A huge number of them exist, which suggests that we're entering into an age of autonomous devices reporting about the world. A sort of Internet of things is emerging.

At the same time, the cost per gigabyte of storing data is decreasing. Cheaper storage makes it economical to track data that would have previously been thrown away. In the 1970s, BBC had a policy of destroying recordings of TV programs once they reached a certain age. This resulted in the loss of more than a hundred episodes of the cult classic Doctor Who. The low data density of storage media available in the 1960s meant that retaining complete archives was cost-prohibitive. Such deletion now would be unimaginable as the cost of storing video has dropped substantially. The cost for storing a gigabyte of information on Amazon's servers is on the order of a penny-a-month and can be even cheaper if the right expertise are available in house. The Parkinson's law states the following:

Work expands so as to fill the time available for its completion.

In a restatement of this law, in our case, it would be "the amount of data will grow to fill the space available to it."

The growth in data has made our lives more difficult. While the amount of data has been growing, our ability to understand them has remained more or less stagnant. The tools available to refine and process large quantities of data have not kept pace. Running simple queries against gigabytes of data is a time-consuming process. Queries such as "list all the tweets that contain the word 'Pepsi'" cannot be realistically completed on anything but a cluster of machines working in parallel. Even when the result is returned, the number of matching records is too large to be processed by a single person or even a team of people.

The term "Big Data" is commonly used to describe the sorts of very large datasets that are becoming more common. Like most terms that have become marketing terms, Big Data is defined differently by different people and companies. In this book we'll think of it as any quantity where running simple queries using traditional database tools on consumer grade hardware is difficult due to computational, storage, or retrieval limits.

Understanding the world of Big Data is a complex proposition. Visualizing data in a meaningful way is going to be one of the great problems of the coming decade. What's more, is that it is going to be a problem that will need to be addressed in domains that have not been traditionally data-rich.

Consider a coffee shop; this is not a company that one would expect would produce a great deal of data. However, consumers who are hungry for data are starting to demand to know from whence the beans for their favorite coffee came, for how long they were roasted, and how they were brewed. A similar program called ThisFish already exists that allows consumers to track the origin of their seafood (http://thisfish.info) all the way back to when it was caught. Providing data about its coffee in an easily accessible form becomes a selling feature for the coffee shop. The following screenshot shows a typical label from a coffee shop showing the source of the beans, roasting time, and organic certification:

People are very interested in data, especially data about their habits. But as interested as people are in data, nobody wants to trawl through an Excel file. They would like to see data presented to them in an accessible and fun way.