Book Image

Expert Data Visualization

By : Jos Dirksen
Book Image

Expert Data Visualization

By: Jos Dirksen

Overview of this book

Do you want to make sense of your data? Do you want to create interactive charts, data trees, info-graphics, geospatial charts, and maps efficiently? This book is your ideal choice to master interactive data visualization with D3.js V4. The book includes a number of extensive examples that to help you hone your skills with data visualization. Throughout nine chapters these examples will help you acquire a clear practical understanding of the various techniques, tools and functionality provided by D3.js. You will first setup your D3.JS development environment and learn the basic patterns needed to visualize your data. After that you will learn techniques to optimize different processes such as working with selections; animating data transitions; creating graps and charts, integrating external resources (static as well as streaming); visualizing information on maps; working with colors and scales; utilizing the different D3.js APIs; and much more. The book will also guide you through creating custom graphs and visualizations, and show you how to go from the raw data to beautiful visualizations. The extensive examples will include working with complex and realtime data streams, such as seismic data, geospatial data, scientific data, and more. Towards the end of the book, you will learn to add more functionality on top of D3.js by using it with other external libraries and integrating it with Ecmascript 6 and Typescript
Table of Contents (10 chapters)

Visualizing our first data

So far we've seen the basics of how D3 works. In this last section of this first chapter, we'll create a simple visualization of some real data. We're going to visualize the popularity of baby names in the USA. The final result will look this:

As you can see in this figure, we create pink bars for the girl names, blue bars for the boy names, and add an axis at the top and the bottom, which shows the number of times that name was chosen. The first thing, though, is take a look at the data.

Sanitizing and getting the data

For this example, we'll download data from https://www.ssa.gov/oact/babynames/limits.html. This site provides data for all the baby names in the US since 1880. On this page, you can find national data and state-specific data. For this example, download the national data dataset. Once you've downloaded it, you can extract it, and you'll see data for a lot of different years:

$ ls -1 
NationalReadMe.pdf
yob1880.txt
yob1881.txt
yob1882.txt
yob1883.txt
yob1884.txt
yob1885.txt
...
yob2013.txt
yob2014.txt
yob2015.txt

As you can see, we have data from 1880 until 2015. For this example, I've used the data from 2015, but you can use pretty much anything you want. Now let's look a bit closer at the data:

$ cat yob2015.txt 
Emma,F,20355
Olivia,F,19553
Sophia,F,17327
Ava,F,16286
Isabella,F,15504
Mia,F,14820
Abigail,F,12311
Emily,F,11727
Charlotte,F,11332
Harper,F,10241
...
Zynique,F,5
Zyrielle,F,5
Noah,M,19511
Liam,M,18281
Mason,M,16535
Jacob,M,15816
William,M,15809
Ethan,M,14991
James,M,14705
Alexander,M,14460
Michael,M,14321
Benjamin,M,13608
Elijah,M,13511
Daniel,M,13408

In this data, we've got a large number of rows where each row shows the name and the sex (M or F). First, all the girls' names are shown, and after that all the boys' names are shown. The data in itself already looks pretty usable, so we don't need to do much processing before we can use it. The only thing, though, we do is add a header to this file, so that it looks like this:

name,sex,amount 
Emma,F,20355
Olivia,F,19553
Sophia,F,17327
Ava,F,16286

This will make parsing this data into D3 a little bit easier, since the default way of parsing CSV data with D3 assumes the first line is a header. The sanitized data we use in this example can be found here: <DVD3>/src/chapter-01/data/yob2015.txt.

Creating the visualization

Now that we've got the data we want to work with, we can start creating the example. The files used in this example are the following:

  • <DVD3>/src/chapter-01/D01-02.html: The HTML template that loads the correct CSS and JavaScript files for this example
  • <DVD3>/src/chapter-01/js/D01-02.js: The JavaScript which uses the D3 APIs to draw the chart
  • <DVD3>/src/chapter-01/css/D01-02.css: Custom CSS to color the bars and format the text elements
  • <DVD3>/src/chapter-01/data/yob2015.txt: The data that is visualized

Let's start with the complete JavaScript file first. It might seem complex, and it introduces a couple of new concepts, but the general idea should be clear from the code (if you open the source file in your editor, you can also see inline comments for additional explanation):

 
function show() {
'use strict';

var margin = { top: 30, bottom: 20, right: 40, left: 40 },
width = 800 - margin.left - margin.right,
height = 600 - margin.top - margin.bottom;

var chart = d3.select('.chart')
.attr('width', width + margin.left + margin.right)
.attr('height', height + margin.top + margin.bottom)
.append('g')
.attr('transform', 'translate(' + margin.left + ','
+ margin.top + ')');

var namesToShow = 10;
var barWidth = 20;
var barMargin = 5;

d3.csv('data/yob2015.txt', function (d) { return { name: d.name, sex: d.sex, amount: +d.amount }; }, function (data) {
var grouped = _.groupBy(data, 'sex');
var top10F = grouped['F'].slice(0, namesToShow);
var top10M = grouped['M'].slice(0, namesToShow);

var both = top10F.concat(top10M.reverse());

var bars = chart.selectAll("g").data(both)
.enter()
.append('g')
.attr('transform', function (d, i) {
var yPos = ((barWidth + barMargin) * i);
return 'translate( 0 ' + yPos + ')';
});

var yScale = d3.scaleLinear()
.domain([0, d3.max(both, function (d) { return d.amount; })])
.range([0, width]);

bars.append('rect')
.attr("height", barWidth)
.attr("width", function (d) { return yScale(d.amount); })
.attr("class", function (d) { return d.sex === 'F' ? 'female' : 'male'; });

bars.append("text")
.attr("x", function (d) { return yScale(d.amount) - 5 ; })
.attr("y", barWidth / 2)
.attr("dy", ".35em")
.text(function(d) { return d.name; });

var bottomAxis = d3.axisBottom().scale(yScale).ticks(20, "s");
var topAxis = d3.axisTop().scale(yScale).ticks(20, "s");

chart.append("g")
.attr('transform', 'translate( 0 ' + both.length * (barWidth + barMargin) + ')')
.call(bottomAxis);

chart.append("g")
.attr('transform', 'translate( 0 ' + -barMargin + ' )')
.call(topAxis);
});
}

In this JavaScript file, we perform the following steps:

  1. Set up the main chart element, like we did in the previous example.
  2. Load the data from the CSV file using d3.csv.
  3. Group the loaded data so we only have the top 10 names for both sexes. Note that we use the groupBy function from the lodash library (https://lodash.com/) for this. This library provides a lot of additional functions to deal with common array operations. Throughout this book, we'll use this library in places where the standard JavaScript APIs don't provide enough functionality.
  4. Add g elements that will hold the rect and text elements for each name.
  5. Create the rect elements with the correct width corresponding to the number of times the name was used.
  6. Create the text elements to show the name at the end of the rect elements.
  7. Add some CSS styles for the rect and text elements.
  8. Add an axis to the top and the bottom for easy referencing.

We'll skip the first step since we've already explained that before, and move on to the usage of the d3.csv API call. Before we do that, there are a couple of variables in the JavaScript that determine how the bars look, and how many we show:

var namesToShow = 10; 
var barWidth = 20;
var barMargin = 5;

These variables will be used throughout the explanation in the following sections. What this means is that we're going to show 10 (namesToShow) names, a bar is 20 (barWidth) pixels wide, and between each bar we put a five pixel margin.

Loading CSV data with D3

To load data asynchronously, D3 provides a number of helper functions. In this case, we've used the d3.csv function:

d3.csv('data/yob2015.txt', 
function (d) { return { name: d.name, sex: d.sex, amount: +d.amount }; },
function (data) {
...
}

The d3.csv function we use takes three parameters. The first one, data/yob2015.txt, is a URL which points to the data we want to load. The second argument is a function that is applied to each row read by D3. The object that's passed into this function is based on the header row of the CSV file. In our case, this data looks like this:

{ 
name: 'Sophie',
sex: 'F',
amount: '1234'
}

This (optional) function allows you to modify the data in the row, before it is passed on as an array (data) to the last argument of the d3.csv function. In this example, we use this second argument to convert the string value d.amount to a numeric value. Once the data is loaded and in this case converted, the function provided as the third argument is called with an array of all the read and converted values, ready for us to visualize the data.

D3 provides a number of functions like d3.csv to load data and resources. These are listed in the following table:

Function Description
d3.csv(url, [row], callback) Retrieve a CSV file, optionally pass each row through the row function. When done the callback function is called with all the read data.
d3.tsv(url, [row], callback) Retrieve a TSV (same as a CSV file but separated by tabs) file, optionally pass each row through the row function. When done the callback function is called with all the read data.
d3.html(url, callback) Get a HTML file, and pass it into the callback function when loaded.
d3.json(url, callback) Get a JSON file, and pass it into the callback function when loaded.
d3.text(url, callback) Get a basic test file, and pass it into the callback function when loaded.
d3.xml(url, [row], callback) Get an XML file, and pass it into the callback function when loaded.


You can also manually process CSV files if they happen to use a different format. You should load those using the d3.text function, and use any of the functions from the d3-dsv module to parse the data. You can find more information on the d3-dsv module here: https://github.com/d3/d3-dsv.

Grouping the loaded data so we only have the top 10 names for both sexes

At this point, we've only loaded the data. If you look back at the figure, you can see that we create a chart using the top 10 female and male names. With the following lines of code, we convert the big incoming data array to an array that contains just the top 10 female and male names:

var grouped = _.groupBy(data, 'sex'); 
var top10F = grouped['F'].slice(0, namesToShow);
var top10M = grouped['M'].slice(0, namesToShow);

var both = top10F.concat(top10M.reverse());

Here we use the lodash's groupBy function,to sort our data based on the sex property of each row. Next we take the first 10 (namesToShow) elements from the grouped data, and create a single array from them using the concat function. We also reverse the top10M array to make the highest boy's name appear at the bottom of the chart (as you can see when you look at the example).

Adding group elements

At this point, we've got the data into a form that we can use. The next step is to create a number of containers, to which we can add the rect that represents the number of times the name was used, and we'll also add a text element there that displays the name:

var bars = chart.selectAll("g").data(both) 
.enter()
.append('g')
.attr('transform', function (d, i) {
var yPos = ((barWidth + barMargin) * i);
return 'translate( 0 ' + yPos + ')';
});

Here, we bind the both array to a number of g elements. We only need to use the enter function here, since we know that there aren't any g elements that can be reused. We position each g element using the translate operation of the transform attribute. We translate the g element along its y-axis based on the barWidth, the barMargin, and the position of the data element (d) in our data (both) array. If you use the Chrome developer tools, you'll see something like this, which nicely shows the calculated translate values:

All that is left to do now, is draw the rectangles and add the names.

Adding the bar chart and baby name

In the previous section, we added the g elements and assigned those to the bars variable. In this section, we're going to calculate the width of the individual rectangles and add those and some text to the g:

var yScale = d3.scaleLinear() 
.domain([0, d3.max(both, function (d) { return d.amount; })])
.range([0, width]);

bars.append('rect')
.attr("height", barWidth)
.attr("width", function (d) { return yScale(d.amount); })
.attr("class", function (d) { return d.sex === 'F' ? 'female' : 'male'; });

bars.append("text")
.attr("x", function (d) { return yScale(d.amount) - 5 ; })
.attr("y", barWidth / 2)
.attr("dy", ".35em")
.text(function(d) { return d.name; });

Here we see something new: the d3.scaleLinear function. With a d3.scaleLinear, we can let D3 calculate how the number of times a name was given (the amount property) maps to a specific width. We want to use the full width (width property, which has a value of 720) of the chart for our bars, so that would mean that the highest value in our input data should map to that value:

  • The name Emma, which occurred 20355 times, should map to a value of 720
  • The name Olivia, which occurred 19553 times, should map to a value of 720 * (19553/20355)
  • The name Mia, which occurred 14820 times, should map to a value of 720 * (14820/20355)
  • And so on...

Now, we could calculate this ourselves and set the size of the rect accordingly, but using the d3.scaleLinear is much easier, and provides additional functionality. Let's look at the definition a bit closer:

var yScale = d3.scaleLinear() 
.domain([0, d3.max(both, function (d) { return d.amount; })])
.range([0, width]);

What we do here, is we define a linear scale, whose input domain is set from 0 to the maximum amount in our data. This input domain is mapped to an output range starting at 0 and ending at width. The result, yScale, is a function which we can now use to map the input domain to the output range: for example, yScale(1234) returns 43.64922623434046.

Once you've got a scale, you can use a couple of functions to change its behavior:

Function Description
invert(val) This function expects a value of the output domain, and returns the corresponding value from the input domain.
rangeRound() You can use this instead of the range option we saw earlier. With this function, the scale only returns rounded values.
clamp(bool) With the clamp function, you define the behavior of what happens when a value is passed in which is outside the input domain. In the case where clamp is true, the minimal or maximum output value is returned. In the case where clamp is false, an output value is calculated normally, which will result in a value outside the output domain.
ticks([count]) This function returns a number of ticks (10 is the default), which can be used to create an axis, or reference lines.
nice([ticks]) This function rounds the first and last value of the input domain. You can optionally specify a number of ticks you want to return, and the rounding function will take those into account.


This is just a small part of the scales support provided by D3. In the rest of the book, we'll explore more of the scales options that are available.

With the scale defined, we can use that to create our rect and text elements in the same way we did in our previous example:

bars.append('rect') 
.attr("height", barWidth)
.attr("width", function (d) { return yScale(d.amount); })
.attr("class", function (d) { return d.sex === 'F' ? 'female' : 'male'; });

Here we create a rect with a fixed height, and a width which is defined by the yScale and the number of times the name was used. We also add a class to the rect so that we can set its colors (and other styling attributes) through CSS. In the case where sex is F, we set the class female and in the other case we set the class male.

To position the text element, we do pretty much the same:

bars.append("text") 
.attr("class", "label")
.attr("x", function (d) { return yScale(d.amount) - 5 ; })
.attr("y", barWidth / 2)
.attr("dy", ".35em")
.text(function(d) { return d.name; });

We create a new text element, position it at the end of the bar, set a custom CSS class, and finally set its value to d.name. The dy attribute might seem a bit strange, but this allows us to position the text nicely in the middle of the bar chart. If we opened the example at this point, we'd see something like this:

We can see that all the information is in there, but it still looks kind of ugly. In the following section, we add some CSS to improve what the chart looks like.

Adding some CSS classes to style the bars and text elements

When we added the rect elements, we added a female class attribute for the girls' names, and a male one for the boys' names and we've also set the style of our text elements to label. In our CSS file, we can now define colors and other styles based on these classes:

.male { 
fill: steelblue;
}

.female {
fill: hotpink;
}

.label {
fill: black;
font: 10px sans-serif;
text-anchor: end;
}

With these CSS properties, we set the fill color of our rectangles. The elements with the male class will be filled steelblue and the elements with the female class will be filled hotpink. We also change how the elements with the .label class are rendered. For these elements, we change the font and the text-anchor. The text-anchor, especially, is important here, since it makes sure that the text element's right side is positioned at the x and y value, instead of the left side. The effect is that the text element is nicely aligned at the end of our bars.

Adding the axis on the top and bottom

The final step we need to take to get the figure from the beginning of this section is to add the top and bottom axes. D3 provides you with a d3.axis<orientation> function, which allows you to create an axis at the bottom, top, left, or right side. When creating an axis, we pass in a scale (which we also used for the width of the rectangles), and tell D3 how the axis should be formatted. In this case, we want 20 ticks, and use the s formatting, which tells D3 to use the international system of units (SI).
This means that D3 will use metric prefixes to format the tick values (more info can be found here: https://en.wikipedia.org/wiki/Metric_prefix).

var bottomAxis = d3.axisBottom().scale(yScale).ticks(20, "s"); 
var topAxis = d3.axisTop().scale(yScale).ticks(20, "s");

chart.append("g")
.attr('transform', 'translate( 0 ' + both.length * (barWidth + barMargin) + ')')
.call(bottomAxis);

chart.append("g")
.attr('transform', 'translate( 0 ' + -barMargin + ' )')
.call(topAxis);

And with that, we've recreated the example we saw at the beginning of this section:

If you look back at the code we showed at the beginning of this section, you can see that we only need a small number of lines of code to create a nice visualization.