Book Image

Mastering Node.js - Second Edition

By : Sandro Pasquali, Kevin Faaborg
Book Image

Mastering Node.js - Second Edition

By: Sandro Pasquali, Kevin Faaborg

Overview of this book

Node.js, a modern development environment that enables developers to write server- and client-side code with JavaScript, thus becoming a popular choice among developers. This book covers the features of Node that are especially helpful to developers creating highly concurrent real-time applications. It takes you on a tour of Node's innovative event non-blocking design, showing you how to build professional applications. This edition has been updated to cover the latest features of Node 9 and ES6. All code examples and demo applications have been completely rewritten using the latest techniques, introducing Promises, functional programming, async/await, and other cutting-edge patterns for writing JavaScript code. Learn how to use microservices to simplify the design and composition of distributed systems. From building serverless cloud functions to native C++ plugins, from chatbots to massively scalable SMS-driven applications, you'll be prepared for building the next generation of distributed software. By the end of this book, you'll be building better Node applications more quickly, with less code and more power, and know how to run them at scale in production environments.
Table of Contents (13 chapters)

Extending JavaScript

When he designed Node, JavaScript was not Ryan Dahl's original language choice. Yet, upon exploration, he found a good modern language without opinions on streams, the filesystem, handling binary objects, processes, networking, and other capabilities one would expect to exist in a systems language. JavaScript, strictly limited to the browser, had no use for, and had not implemented, these features.

Guided by the Unix philosophy, Dahl was guided by a few rigid principles:

  • A Node program/process runs on a single thread, ordering execution through an event loop
  • Web applications are I/O intensive, so the focus should be on making I/O fast
  • Program flow is always directed through asynchronous callbacks
  • Expensive CPU operations should be split off into separate parallel processes, emitting events as results arrive
  • Complex programs should be assembled from simpler programs

The general principle is, operations must never block. Node's desire for speed (high concurrency) and efficiency (minimal resource usage) demands the reduction of waste. A waiting process is a wasteful process, especially when waiting for I/O.

JavaScript's asynchronous, event-driven design fits neatly into this model. Applications express interest in some future event, and are notified when that event occurs. This common JavaScript pattern should be familiar to you:

Window.onload = function() {
// When all requested document resources are loaded,
// do something with the resulting environment
}
element.onclick = function() {
// Do something when the user clicks on this element
}

The time it will take for an I/O action to complete is unknown, so the pattern is to ask for notification when an I/O event is emitted, whenever that may be, allowing other operations to be completed in the meantime.

Node adds an enormous amount of new functionality to JavaScript. Primarily, the additions provide evented I/O libraries offering the developer system access not available to browser-based JavaScript, such as writing to the filesystem or opening another system process. Additionally, the environment is designed to be modular, allowing complex programs to be assembled out of smaller and simpler components.

Let's look at how Node imported JavaScript's event model, extended it, and used it in the creation of interfaces to powerful system commands.

Events

Many functions in the Node API emit events. These events are instances of events.EventEmitter. Any object can extend EventEmitter, providing Node developers with a simple and uniform way to build tight, asynchronous interfaces to object methods.

The following code sets Node's EventEmitter object as the prototype of a function constructor we define. Each constructed instance has the EventEmitter object exposed to its prototype chain, providing a natural reference to the event API. The counter instance methods emit events, and code after that listens for them. After making a Counter, we listen for the incremented event, specifying a callback Node will call when the event happens. Then, we call the increment twice. Each time, our Counter increments the internal count it holds, and then emits the incremented event. This calls our callback, giving it the current count, which our callback logs:

// File counter.js
// Load Node's 'events' module, and point directly to EventEmitter there
const EventEmitter = require('events').EventEmitter;
// Define our Counter function
const Counter = function(i) { // Takes a starting number
this.increment = function() { // The counter's increment method
i++; // Increment the count we hold
this.emit('incremented', i); // Emit an event named incremented
}
}
// Base our Counter on Node's EventEmitter
Counter.prototype = new EventEmitter(); // We did this afterwards, not before!
// Now that we've defined our objects, let's see them in action
// Make a new Counter starting at 10
const counter = new Counter(10);
// Define a callback function which logs the number n you give it
const callback = function(n) {
console.log(n);
}
// Counter is an EventEmitter, so it comes with addListener
counter.addListener('incremented', callback);
counter.increment(); // 11
counter.increment(); // 12

The following is the output for counter.js:

$ node counter.js
11
12

To remove the event listeners bound to counter, use this:

counter.removeListener('incremented', callback).

For consistency with browser-based JavaScript, counter.on and counter.addListener are interchangeable.

Node brought EventEmitter to JavaScript and made it an object your objects can extend. This greatly increases the possibilities available to developers. With EventEmitter, Node can handle I/O data streams in an event-oriented manner, performing long-running tasks while keeping true to Node's principles of asynchronous, non-blocking programming:

// File stream.js
// Use Node's stream module, and get Readable inside
let Readable = require('stream').Readable;
// Make our own readable stream, named r
let r = new Readable;
// Start the count at 0
let count = 0;
// Downstream code will call r's _read function when it wants some data from r
r._read = function() {
count++;
if (count > 10) { // After our count has grown beyond 10
return r.push(null); // Push null downstream to signal we've got no more data
}
setTimeout(() => r.push(count + '\n'), 500); // A half second from now, push our count on a line
};
// Have our readable send the data it produces to standard out
r.pipe(process.stdout);

The following is the output for stream.js:

$ node stream.js
1
2
3
4
5
6
7
8
9
10

This example creates a readable stream r, and pipes its output to the standard out. Every 500 milliseconds, code increments a counter and pushes a line of text with the current count downstream. Try running the program yourself, and you'll see the series of numbers appear on your terminal.

On what would be the 11th count, r pushes null downstream, indicating that it has no more data to send. This closes the stream, and with nothing more to do, Node exits the process.

Subsequent chapters will explain streams in more detail. Here, just note how pushing data onto a stream causes an event to fire, how you can assign a custom callback to handle this event, and how the data flows downstream.

Node consistently implements I/O operations as asynchronous, evented data streams. This design choice enables Node's excellent performance. Instead of creating a thread (or spinning up an entire process) for a long-running task like a file upload that a stream may represent, Node only needs to commit the resources to handle callbacks. Additionally, in the long stretches of time in between the short moments when the stream is pushing data, Node's event loop is free to process other instructions.

As an exercise, re-implement stream.js to send the data r produces to a file instead of the terminal. You'll need to make a new writable stream w, using Node's fs.createWriteStream:

// File stream2file.js
// Bring in Node's file system module
const fs = require('fs');
// Make the file counter.txt we can fill by writing data to writeable stream w
const w = fs.createWriteStream('./counter.txt', { flags: 'w', mode: 0666 });
...
// Put w beneath r instead
r.pipe(w);

Modularity

In his book, The Art of Unix Programming, Eric Raymond proposed the Rule of Modularity:

"Developers should build a program out of simple parts connected by well-defined interfaces, so problems are local, and parts of the program can be replaced in the future versions to support new features. This rule aims to save time on debugging complex code that is complex, long, and unreadable."

Large systems are hard to reason about, especially when the boundaries of internal components are fuzzy, and the interactions between them are complex. This principle of building large systems out of small, simple, and loosely-coupled pieces is a good idea for software and beyond. Physical manufacturing, management theory, education, and government, all have benefited from this design philosophy.

When developers began employing JavaScript for larger and more complex software challenges, they encountered this challenge. There was not yet a good way (and later, no common standard way) to assemble a JavaScript program from smaller ones. For example, you've probably seen HTML pages with tags like these at the top:

<head>
<script src="fileA.js"></script>
<script src="fileB.js"></script>
<script src="fileC.js"></script>
<script src="fileD.js"></script>
...
</head>

This works, but leads to a number of problems:

  • The page must declare all potential dependencies before any are needed or used. If, while running, your program encounters a situation where it needs an additional dependency, dynamically loading another module is possible, but a separate hack.
  • The scripts are not encapsulated. Code in every file writes to the same global object. Adding a new dependency may break an earlier one because of a name collision.
  • fileA cannot address fileB as a collection. An addressable context like fileB.function1 isn't available.

The <script> tag would be a nice place for useful module services such as dependency awareness and version control, but it doesn't have these features.

These difficulties and dangers made creating and using JavaScript modules feel more treacherous than effortless. A good module system with features like encapsulation and versioning can reverse this, encouraging code organization and sharing, and leading to a robust ecosystem of high-quality open source software components.

JavaScript needed a standard way to load and share discreet program modules, and found one in 2009 with the CommonJS Modules specification. Node follows this specification, making it easy to define and share bits of reusable code called modules or packages.

Choosing a delightfully simple design, a package is just a directory of JavaScript files. Metadata about the package, such as its name, version, and software license, lives in an additional file named package.json. The JSON contents of this file are easily both human and machine-readable. Let's take a look:

{
"name": "mypackage1",
"version": "0.1.2",
"dependencies": {
"jquery": "^3.1.0",
"bluebird": "^3.4.1",
},
"license": "MIT"
}

This package.json defines a package named mypackage1, which depends on two other packages: jQuery and Bluebird. Alongside the package names is a version number. Version numbers follow the Semantic Versioning (SemVer) rules, with a pattern like Major.Minor.Patch. Looking at the incremented version numbers of a package your code has been using, here's what that means:

  • Major: There's a change in the purpose or outcome of the API. If your code calls an updated function, it may break or produce an unintended result. Figure out what's changed, and determine if it affects your code.
  • Minor: The package has added functionality, but remains compatible. Run all your tests, and you're good to go. Check out the documentation if you're curious, as there might be new, more advanced parts of the API alongside the functions and objects you're familiar with.
  • Patch: The package fixed a bug, improved performance, or refactored a little. Run all your tests, and you're good to go.

Packages enable the construction of large systems from many small, interdependent systems. Perhaps even more importantly, packages encourage sharing. More detailed information about SemVer is available in Appendix A, Organizing Your Work Into Modules, where npm and packages are discussed in more depth.

"What I'm describing here is not a technical problem. It's a matter of people getting together and making a decision to step forward and start building up something bigger and cooler together."
– Kevin Dangoor, creator of CommonJS
Not just about modules, CommonJS is actually a whole collection of standards founded with the goal of removing everything that was holding JavaScript back from world domination, open source developer Kris Kowal explained in a 2009 post evangelizing the initiative. He names the first of these impediments as the absence of a good module system. The second? The absence of a standard library, including such systems-level fundamentals as access to the filesystem, manipulation of I/O streams, and types for bytes and blocks of binary data. Today, CommonJS is known for giving JavaScript a module system, while Node is what gave JavaScript systems-level access:

https://arstechnica.com/information-technology/2009/12/commonjs-effort-sets-javascript-on-path-for-world-domination/

CommonJS gave JavaScript packages. With packages, the next thing JavaScript needed was a package manager. Node provided one with npm.

A registry of packages, npm is accessible in two ways. First, at the website www.npmjs.com, you can link to and search for packages, essentially shopping for the right one. Stats that count how many times a package has been downloaded in the last day, week, and month show popularity and usage. Most packages link to a developer profile page and open source code on GitHub, so you can see the code, visualize recent development activity, and judge the reputations of the authors and contributors.

The second way to access npm is through the command-line tool npm, which is installed with Node. Using npm as a traditional package manager for your workstation, you can install packages globally, creating new command-line tools on your shell's path. npm also knows how to create, read, and edit package.json files, and can start you out with a new, empty Node package, add the dependencies it needs, download all the code, and keep everything up to date.

Along with Git and GitHub, npm is now achieving a dream of software development identified in the 1970s: that code could be reused more often, and software projects would be written entirely from scratch less frequently.
Earlier attempts at reaching this goal through version control systems like CVS and Subversion, and open source code sharing websites like SourceForge.net, focused on bigger units of both code and people, and didn't achieve as much.

GitHub and npm took a different approach in two important ways:
  • Favoring individual developers working alone over communities meeting and discussing, developers could focus more on code and less on conversation
  • Favoring small, atomic software components over complete applications, encapsulated composition started happening not just at a micro-level of subroutines and objects, but at the more important macroscale of application design
Even documentation is better with the new approach: in a monolithic software application, documentation was too often the afterthought that may or may not have happened after the product shipped.
With components, great documentation is necessary to sell your package to the world, getting it a larger public daily download count, and the social media accounts you keep as a developer of more followers.

In no small part, Node's success is due to the number and quality of packages available to you as a Node developer.

More extensive information on creating and managing Node packages can be found in Appendix A, Organizing Your Work into Modules.

The key design philosophy to follow is this: build programs out of packages where possible, and share those packages when possible. The shape of your applications will be clearer and easier to maintain. Importantly, the efforts of thousands of other developers can be linked into applications via npm, directly by inclusion, and indirectly as shared packages are tested, improved, refactored, and repurposed by members of the Node community.

Contrary to popular belief, npm is not an abbreviation for Node Package Manager, and should never be used or explained as an acronym:
https://docs.npmjs.com/policies/trademark

The network

I/O in the browser is mercilessly hobbled, for very good reasons - if the JavaScript on any given website could access your filesystem, for instance, users could only click links to new sites they trusted, rather than ones they simply wanted to try out. Keeping pages in a limited sandbox, the design of the web made navigating from thing1.com to thing2.com not have the consequences of double-clicking thing1.exe and thing2.exe.

Node, of course, recasts JavaScript in the role of a systems language, giving it direct and unfettered access to operating system kernel objects such as files, sockets, and processes. This lets Node create scalable systems with high I/O requirements. It's likely the first thing you coded in Node was a HTTP server.

Node supports standard network protocols in addition to HTTP, such as TLS/SSL, and UDP. With these tools we can easily build scalable network programs, moving far beyond the comparatively limited AJAX solutions JavaScript developers know from the browser.

Let's write a simple program that sends a UDP packet to another node:

const dgram = require('dgram');
let client = dgram.createSocket("udp4");
let server = dgram.createSocket("udp4");
let message = process.argv[2] || "message";
message = Buffer.from(message);
server
.on('message', msg => {
process.stdout.write(`Got message: ${msg}\n`);
process.exit();
})
.bind(41234);
client.send(message, 0, message.length, 41234, "localhost");

Go ahead and open two terminal windows and navigate each to your code bundle for Chapter 8, Scaling Your Application, under the /udp folder. We're now going to run a UDP server in one window, and a UDP client in another.

In the right window, run receive.js with a command like the following:

$ node receive.js

On the left, run send.js with a command, as follows:

$ node send.js

Executing that command will cause the message to appear on the right:

$ node receive.js
Message received!

A UDP server is an instance of EventEmitter, emitting a message event when messages are received on the bound port. With Node, you can use JavaScript to write your application at the I/O level, moving packets and streams of binary data with ease.

Let's continue to explore I/O, the process object, and events. First, let's dig into the machine powering Node's core, V8.