Book Image

Data Visualization with D3.js Cookbook

By : Nick Zhu
Book Image

Data Visualization with D3.js Cookbook

By: Nick Zhu

Overview of this book

D3.js is a JavaScript library designed to display digital data in dynamic graphical form. It helps you bring data to life using HTML, SVG, and CSS. D3 allows great control over the final visual result, and it is the hottest and most powerful web-based data visualization technology on the market today. "Data Visualization with D3.js Cookbook" is packed with practical recipes to help you learn every aspect of data visualization with D3. "Data Visualization with D3.js Cookbook" is designed to provide you with all the guidance you need to get to grips with data visualization with D3. With this book, you will create breathtaking data visualization with professional efficiency and precision with the help of practical recipes, illustrations, and code samples. "Data Visualization with D3.js Cookbook" starts off by touching upon data visualization and D3 basics before gradually taking you through a number of practical recipes covering a wide range of topics you need to know about D3. You will learn the fundamental concepts of data visualization, functional JavaScript, and D3 fundamentals including element selection, data binding, animation, and SVG generation. You will also learn how to leverage more advanced techniques such as custom interpolators, custom tweening, timers, the layout manager, force manipulation, and so on. This book also provides a number of pre-built chart recipes with ready-to-go sample code to help you bootstrap quickly.
Table of Contents (21 chapters)
Data Visualization with D3.js Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

The crossfilter.js library


Crossfilter is also a library created by D3's author Mike Bostock, initially used to power analytics for Square Register.

Crossfilter is a JavaScript library for exploring large multivariate datasets in browser. Crossfilter supports extremely fast (<30ms) interaction with coordinated views, even with datasets containing a million or more records.

-Crossfilter Wiki (August 2013)

In other words, Crossfilter is a library that you can use to generate data dimensions on large and typically flat multivariate datasets. So what is a data dimension? A data dimension can be considered as a type of data grouping or categorization while each dimensional data element is a categorical variable. Since this is still a pretty abstract concept, let's take a look at the following JSON dataset and see how it can be transformed into dimensional dataset using Crossfilter. Assume that we have the following flat dataset in JSON describing payment transactions in a bar:

[
  {"date": "2011-11-14T01:17:54Z", "quantity": 2, "total": 190, "tip": 100, "type": "tab"},
  {"date": "2011-11-14T02:20:19Z", "quantity": 2, "total": 190, "tip": 100, "type": "tab"},
  {"date": "2011-11-14T02:28:54Z", "quantity": 1, "total": 300, "tip": 200, "type": "visa"},
..
]

Note

Sample dataset borrowed from Crossfilter Wiki: https://github.com/square/crossfilter/wiki/API-Reference.

How many dimensions do we see here in this sample dataset? The answer is: it has as many dimensions as the number of different ways that you can categorize the data. For example, since this data is about customer payment, which is observation on time series, obviously the "date" is a dimension. Secondly, the payment type is naturally a way to categorize data; therefore, "type" is also a dimension. The next dimension is bit tricky since technically we can model any of the field in the dataset as dimension or its derivatives; however, we don't want to make anything as a dimension which does not help us slice the data more efficiently or provide more insight into what the data is trying to say. The total and tip fields have very high cardinality, which usually is an indicator for poor dimension (though tip/total, that is, tip in percentage could be an interesting dimension); however, the "quantity" field is likely to have a relatively small cardinality assuming people don't buy thousands of drinks in this bar, therefore, we choose to use quantity as our third dimension. Now, here is what the dimensional logical model looks like:

Dimensional Dataset

These dimensions allow us to look at the data from a different angle, and if combined will allow us to ask some pretty interesting questions, for example:

  • Are customers who pay by tab more likely to buy in larger quantity?

  • Are customers more likely to buy larger quantity on Friday night?

  • Are customers more likely to tip when using tab versus cash?

Now, you can see why dimensional dataset is such a powerful idea. Essentially, each dimension gives you a different lens to view your data, and when combined, they can quickly turn raw data into knowledge. A good analyst can quickly use this kind of tool to formulate a hypothesis, hence gaining knowledge from data.

How to do it...

Now, we understand why we would want to establish dimensions with our dataset; let's see how this can be done using Crossfilter:

var timeFormat = d3.time.format.iso;
var data = crossfilter(json); // <-A

var hours = data.dimension(function(d){
  return d3.time.hour(timeFormat.parse(d.date)); // <-B
});
var totalByHour = hours.group().reduceSum(function(d){
  return d.total;
});

var types = data.dimension(function(d){return d.type;});
var transactionByType = types.group().reduceCount();
        
var quantities = data.dimension(function(d){return d.quantity;});
var salesByQuantity = quantities.group().reduceCount();

How it works...

As shown in the preceding section, creating dimensions and groups are quite straight-forward in Crossfilter. First step before we can create anything is to feed our JSON dataset, loaded using D3, through Crossfilter by calling the crossfilter function (line A). Once that's done, you can create your dimension by calling the dimension function and pass in an accessor function that will retrieve the data element that can be used to define the dimension. In the case for type we will simply pass in function(d){return d.type;}. You can also perform data formatting or other task in dimension function (for example, date formatting on line B). After creating the dimensions, we can perform the categorization or grouping by using the dimension, so totalByHour is a grouping that sums up total amount of the sale for each hour, while salesByQuantity is a grouping of counting the number of transactions by quantity. To better understand how group works, we will take a look at what the group object looks like. If you invoke the all function on the transactionsByType group you will get the following objects back:

Crossfilter Group Objects

We can clearly see that transactionByType group is essentially a grouping of the data element by its type while counting the total number of data elements within each group since we had called reduceCount function when creating the group.

The following are the description for functions we used in this example:

  • crossfilter: Creates a new crossfilter with given records if specified. Records can be any array of objects or primitives.

  • dimension: Creates a new dimension using the given value accessor function. The function must return naturally-ordered values, that is, values that behave correctly with respect to JavaScript's <, <=, >=, and > operators. This typically means primitives: Booleans, numbers, or strings.

  • dimension.group: Creates a new grouping for the given dimension, based on the given groupValue function, which takes a dimension value as input and returns the corresponding rounded value.

  • group.all: Returns all groups, in ascending natural order by key.

  • group.reduceCount: A shortcut function to count the records; returns this group.

  • group.reduceSum: A shortcut function to sum records using the specified value accessor function.

At this point we have everything we want to analyze. Now, let's see how this can be done in minutes instead of hours or days.

There's more...

We have only touched a very limited number of Crossfilter functions. Crossfilter provides a lot more capability when it comes to how dimension and group can be created; for more information please check out its API reference: https://github.com/square/crossfilter/wiki/API-Reference.