Node.js Web Development - Fifth Edition

By : David Herron

Node.js Web Development - Fifth Edition

By: David Herron

Overview of this book

Node.js is the leading choice of server-side web development platform, enabling developers to use the same tools and paradigms for both server-side and client-side software. This updated fifth edition of Node.js Web Development focuses on the new features of Node.js 14, Express 4.x, and ECMAScript, taking you through modern concepts, techniques, and best practices for using Node.js. The book starts by helping you get to grips with the concepts of building server-side web apps with Node.js. You’ll learn how to develop a complete Node.js web app, with a backend database tier to help you explore several databases. You'll deploy the app to real web servers, including a cloud hosting platform built on AWS EC2 using Terraform and Docker Swarm, while integrating other tools such as Redis and NGINX. As you advance, you'll learn about unit and functional testing, along with deploying test infrastructure using Docker. Finally, you'll discover how to harden Node.js app security, use Let's Encrypt to provision the HTTPS service, and implement several forms of app security with the help of expert practices. With each chapter, the book will help you put your knowledge into practice throughout the entire life cycle of developing a web app. By the end of this Node.js book, you’ll have gained practical Node.js web development knowledge and be able to build and deploy your own apps on a public web hosting solution.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Section 1: Introduction to Node.js

Free Chapter

About Node.js

Overview of Node.js

The capabilities of Node.js

Why should you use Node.js?

The Node.js event-driven architecture

Embracing advances in the JavaScript language

Developing microservices or maxiservices with Node.js

Summary

Setting Up Node.js

System requirements

Installing Node.js using package managers

Installing from the source on POSIX-like systems

Installing multiple Node.js instances with nvm

Requirements for installing native code modules

Choosing Node.js versions to use and the version policy

Choosing editors and debuggers for Node.js

Running and testing commands

Advancing Node.js with ECMAScript 2015, 2016, 2017, and beyond

Summary

Exploring Node.js Modules

Defining a Node.js module

Finding and loading modules using require and import

Using npm – the Node.js package management system

The Yarn package management system

Summary

HTTP Servers and Clients

Sending and receiving events with EventEmitter

Understanding HTTP server applications

HTTP Sniffer – listening to the HTTP conversation

Web application frameworks

Getting started with Express

Creating an Express application to compute Fibonacci numbers

Making HTTPClient requests

Calling a REST backend service from an Express application

Summary

Section 2: Developing the Express Application

Your First Express Application

Exploring Promises and async functions in Express router functions

Architecting an Express application in the MVC paradigm

Creating the Notes application

Theming your Express application

Scaling up – running multiple Notes instances

Summary

Implementing the Mobile-First Paradigm

Understanding the problem – the Notes app isn't mobile-friendly

Learning the mobile-first paradigm theory

Using Twitter Bootstrap on the Notes application

Flexbox and CSS Grids

Mobile-first design for the Notes application

Customizing a Bootstrap build

Summary

Data Storage and Retrieval

Remembering that data storage requires asynchronous code

Logging and capturing uncaught errors

Storing notes in a filesystem

Storing notes with the LevelDB datastore

Storing notes in SQL with SQLite3

Storing notes the ORM way with Sequelize

Storing notes in MongoDB

Summary

Authenticating Users with a Microservice

Creating a user information microservice

Providing login support for the Notes application

Providing Twitter login support for the Notes application

Keeping secrets and passwords secure

Running the Notes application stack

Summary

Dynamic Client/Server Interaction with Socket.IO

Introducing Socket.IO

Initializing Socket.IO with Express

Real-time updates on the Notes homepage

Inter-user chat and commenting for Notes

Summary

Section 3: Deployment

Deploying Node.js Applications to Linux Servers

Notes application architecture and deployment considerations

Traditional Linux deployment for Node.js services

Adjusting Twitter authentication to work on the server

Setting up PM2 to manage Node.js processes

Summary

Deploying Node.js Microservices with Docker

Setting up Docker on your laptop or computer

Setting up the user authentication service in Docker

Creating FrontNet for the Notes application

Managing multiple containers with Docker Compose

Using Redis for scaling the Notes application stack

Summary

Deploying a Docker Swarm to AWS EC2 with Terraform

Signing up with AWS and configuring the AWS CLI

An overview of the AWS infrastructure to be deployed

Using Terraform to create an AWS infrastructure

Setting up a Docker Swarm cluster on AWS EC2

Setting up ECR repositories for Notes Docker images

Creating a Docker stack file for deployment to Docker Swarm

Provisioning EC2 instances for a full Docker swarm

Deploying the Notes stack file to the swarm

Summary

Unit Testing and Functional Testing

Assert <span>–</span> the basis of testing methodologies

Testing a Notes model

Using Docker Swarm to manage test infrastructure

Testing REST backend services

Automating test results reporting

<span>Frontend headless browser testing with Puppeteer</span>

Summary

Security in Node.js Applications

Implementing HTTPS in Docker for deployed Node.js applications

Using Helmet for across-the-board security in Express applications

Addressing Cross-Site Request Forgery (CSRF) attacks

Denying SQL injection attacks

Scanning for known vulnerabilities in Node.js packages

Using good cookie practices

Hardening the AWS EC2 deployment

AWS EC2 security best practices

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

The Node.js event-driven architecture

Node.js's blistering performance is said to be because of its asynchronous event-driven architecture and its use of the V8 JavaScript engine. This enables it to handle multiple tasks concurrently, such as juggling between requests from multiple web browsers. The original creator of Node.js, Ryan Dahl, followed these key points:

A single-thread, event-driven programming model is simpler to code and has less complexity and less overhead than application servers that rely on threads to handle multiple concurrent tasks.

By converting blocking function calls into asynchronous code execution, you can configure the systems so that it issues an event when the blocking request is satisfied.
You can leverage the V8 JavaScript engine from the Chrome browser, and all the work goes into improving V8; all the performance enhancements going into V8, therefore, benefits Node.js.

In most application servers, concurrency, or the ability to handle multiple concurrent requests, is implemented with a multithreaded architecture. In such a system, any request for data, or any other blocking function call, causes the current execution thread to suspend and wait for the result. Handling concurrent requests requires there to be multiple execution threads. When one thread is suspended, another thread can execute. This causes churn as the application server starts and stops the threads to handle requests. Each suspended thread (typically waiting on an input/output operation to finish) consumes a full call stack of memory, adding to overhead. Threads add complexity to the application server as well as server overhead.

To help us wrap our heads around why this would be, Ryan Dahl, the creator of Node.js, offered the following example. In his Cinco de NodeJS presentation in May 2010 (https://www.youtube.com/watch?v=M-sc73Y-zQA) Dahl asked us what happens when we execute a line of code such as this:

result = query('SELECT * from db.table'); 
// operate on the result

Of course, the program pauses at this point while the database layer sends the query to the database and waits for the result or the error. This is an example of a blocking function call. Depending on the query, this pause can be quite long (well, a few milliseconds, which is ages in computer time). This pause is bad because the execution thread can do nothing while it waits for the result to arrive. If your software is running on a single-threaded platform, the entire server would be blocked and unresponsive. If instead your application is running on a thread-based server platform, a thread-context switch is required to satisfy any other requests that arrive. The greater the number of outstanding connections to the server, the greater the number of thread-context switches. Context switching is not free because more threads require more memory per thread state and more time for the CPU to spend on thread management overheads.

The key inspiration guiding the original development of Node.js was the simplicity of a single-threaded system. A single execution thread means that the server doesn't have the complexity of multithreaded systems. This choice meant that Node.js required an event-driven model for handling concurrent tasks. Instead of the code waiting for results from a blocking request, such as retrieving data from a database, an event is instead dispatched to an event handler.

Using threads to implement concurrency often comes with admonitions, such as expensive and error-prone, the error-prone synchronization primitives of Java, or designing concurrent software can be complex and error-prone. The complexity comes from access to shared variables and various strategies to avoid deadlock and competition between threads. The synchronization primitives of Java are an example of such a strategy, and obviously many programmers find them difficult to use. There's a tendency to create frameworks such as java.util.concurrent to tame the complexity of threaded concurrency, but some argue that papering over complexity only makes things more complex.

A typical Java programmer might object at this point. Perhaps their application code is written against a framework such as Spring, or maybe they're directly using Java EE. In either case, their application code does not use concurrency features or deal with threads, and therefore where is the complexity that we just described? Just because that complexity is hidden within Spring and Java EE does not mean that there is no complexity and overhead.

Okay, we get it: while multithreaded systems can do amazing things, there is inherent complexity. What does Node.js offer?

The Node.js answer to complexity

Node.js asks us to think differently about concurrency. Callbacks fired asynchronously from an event loop are a much simpler concurrency model—simpler to understand, simpler to implement, simpler to reason about, and simpler to debug and maintain.

Node.js has a single execution thread with no waiting on I/O or context switching. Instead, there is an event loop that dispatches events to handler functions as things happen. A request that would have blocked the execution thread instead executes asynchronously, with the results or errors triggering an event. Any operation that would block or otherwise take time to complete must use the asynchronous model.

The original Node.js paradigm delivered the dispatched event to an anonymous function. Now that JavaScript has async functions, the Node.js paradigm is shifting to deliver results and errors via a promise that is handled by the await keyword. When an asynchronous function is called, control quickly passes to the event loop rather than causing Node.js to block. The event loop continues handling the variety of events while recording where to send each result or error.

By using an asynchronous event-driven I/O, Node.js removes most of this overhead while introducing very little of its own.

One of the points Ryan Dahl made in the Cinco de Node presentation is a hierarchy of execution time for different requests. Objects in memory are more quickly accessed (in the order of nanoseconds) than objects on disk or objects retrieved over the network (milliseconds or seconds). The longer access time for external objects is measured in zillions of clock cycles, which can be an eternity when your customer is sitting at their web browser ready to move on if it takes longer than two seconds to load the page.

Therefore, concurrent request handling means using a strategy to handle the requests that take longer to satisfy. If the goal is to avoid the complexity of a multithreaded system, then the system must use asynchronous operations as Node.js does.

What do these asynchronous function calls look like?

Asynchronous requests in Node.js

In Node.js, the query that we looked at previously will read as follows:

query('SELECT * from db.table', function (err, result) { 
    if (err) throw err; // handle errors 
    // operate on result 
});

The programmer supplies a function that is called (hence the name callback function) when the result (or error) is available. The query function still takes the same amount of time. Instead of blocking the execution thread, it returns to the event loop, which is then free to handle other requests. The Node.js will eventually fire an event that causes this callback function to be called with the result or error indication.

A similar paradigm is used in client-side JavaScript, where we write event handler functions all the time.

Advances in the JavaScript language have given us new options. When used with ES2015 promises, the equivalent code would look like this:

query('SELECT * from db.table') 
.then(result => { 
    // operate on result 
}) 
.catch(err => { 
    // handle errors 
});

This is a little better, especially in instances of deeply nested event handling.

The big advance came with the ES-2017 async function:

try {
    const result = await query('SELECT * from db.table');
    // operate on result
} catch (err) {
    // handle errors
}

Other than the async and await keywords, this looks like code we'd write in other languages, and is much easier to read. Because of what await does, it is still asynchronous code execution.

All three of these code snippets perform the same query that we wrote earlier. Instead of query being a blocking function call, it is asynchronous and does not block the execution thread.

With both the callback functions and the promise's asynchronous coding, Node.js had its own complexity issue. Oftentimes, we call one asynchronous function after another. With callback functions, that meant deeply nested callback functions, and with promises, that meant a long chain of .then handler functions. In addition to the complexity of the coding, we have errors and results landing in unnatural places. Instead of landing on the next line of code, the asynchronously executed callback function is invoked. The order of execution is not one line after another, as it is in synchronous programming languages; instead, the order of execution is determined by the order of the callback function execution.

The async function approach solves that coding complexity. The coding style is more natural since the results and errors land in the natural place, at the next line of code. The await keyword integrates asynchronous result handling without blocking the execution thread. A lot is buried under the covers of the async/await feature, and we'll be covering this model extensively throughout this book.

But does the asynchronous architecture of Node.js actually improve performance?

Performance and utilization

Some of the excitement over Node.js is due to its throughput (the requests per second that it can serve). Comparative benchmarks of similar applications—for example, Apache—show that Node.js has tremendous performance gains.

One benchmark going around is the following simple HTTP server (borrowed from https://nodejs.org/en/), which simply returns a Hello World message directly from memory:

var http = require('http'); 
http.createServer(function (req, res) { 
  res.writeHead(200, {'Content-Type': 'text/plain'}); 
  res.end('Hello World\n'); 
}).listen(8124, "127.0.0.1"); 
console.log('Server running at http://127.0.0.1:8124/');

This is one of the simpler web servers that you can build with Node.js. The http object encapsulates the HTTP protocol, and its http.createServer method creates a whole web server, listening on the port specified in the listen method. Every request (whether a GET or POST on any URL) on that web server calls the provided function. It is very simple and lightweight. In this case, regardless of the URL, it returns a simple text/plain that is the Hello World response.

Ryan Dahl showed a simple benchmark in a video titled Ryan Dahl: Introduction to Node.js (on the YUI Library channel on YouTube, https://www.youtube.com/watch?v=M-sc73Y-zQA). It used a similar HTTP server to this, but that returned a one-megabyte binary buffer; Node.js gave 822 req/sec, while Nginx gave 708 req/sec, for a 15% improvement over Nginx. He also noted that Nginx peaked at four megabytes of memory, while Node.js peaked at 64 megabytes.

The key observation was that Node.js, running an interpreted, JIT-compiled, high-level language, was about as fast as Nginx, built of highly optimized C code, while running similar tasks. That presentation was in May 2010, and Node.js has improved hugely since then, as shown in Chris Bailey's talk that we referenced earlier.

Yahoo! search engineer Fabian Frank published a performance case study of a real-world search query suggestion widget implemented with Apache/PHP and two variants of Node.js stacks (http://www.slideshare.net/FabianFrankDe/nodejs-performance-case-study). The application is a pop-up panel showing search suggestions as the user types in phrases using a JSON-based HTTP query. The Node.js version could handle eight times the number of requests per second with the same request latency. Fabian Frank said both Node.js stacks scaled linearly until CPU usage hit 100%.

LinkedIn did a massive overhaul of their mobile app using Node.js for the server-side to replace an old Ruby on Rails app. The switch lets them move from 30 servers down to 3, and allowed them to merge the frontend and backend team because everything was written in JavaScript. Before choosing Node.js, they'd evaluated Rails with Event Machine, Python with Twisted, and Node.js, chose Node.js for the reasons that we just discussed. For a look at what LinkedIn did, see http://arstechnica.com/information-technology/2012/10/a-behind-the-scenes-look-at-linkedins-mobile-engineering/.

Most existing Node.js performance tips tend to have been written for older V8 versions that used the CrankShaft optimizer. The V8 team has completely dumped CrankShaft, and it has a new optimizer called TurboFan—for example, under CrankShaft, it was slower to use try/catch, let/const, generator functions, and so on. Therefore, common wisdom said to not use those features, which is depressing because we want to use the new JavaScript features because of how much it has improved the JavaScript language. Peter Marshall, an engineer on the V8 team at Google, gave a talk at Node.js Interactive 2017 claiming that, using TurboFan, you should just write natural JavaScript. With TurboFan, the goal is for across-the-board performance improvements in V8. To view the presentation, see the video titled High Performance JS in V8 at https://www.youtube.com/watch?v=YqOhBezMx1o.

A truism about JavaScript is that it's no good for heavy computation work because of the nature of JavaScript. We'll go over some ideas that are related to this in the next section. A talk by Mikola Lysenko at Node.js Interactive 2016 went over some issues with numerical computing in JavaScript, and some possible solutions. Common numerical computing involves large numerical arrays processed by numerical algorithms that you might have learned in calculus or linear algebra classes. What JavaScript lacks is multidimensional arrays and access to certain CPU instructions. The solution that he presented is a library to implement multidimensional arrays in JavaScript, along with another library full of numerical computing algorithms. To view the presentation, see the video titled Numerical Computing in JavaScript by Mikola Lysenko at https://www.youtube.com/watch?v=1ORaKEzlnys .

At the Node.js Interactive conference in 2017, IBM's Chris Bailey made a case for Node.js being an excellent choice for highly scalable microservices. Key performance characteristics are I/O performance (measured in transactions per second), startup time (because that limits how quickly your service can scale up to meet demand), and memory footprint (because that determines how many application instances can be deployed per server). Node.js excels on all those measures; with every subsequent release, it either improves on each measure or remains fairly steady. Bailey presented figures comparing Node.js to a similar benchmark written in Spring Boot showing Node.js to perform much better. To view his talk, see the video titled Node.js Performance and Highly Scalable Micro-Services - Chris Bailey, IBM at https://www.youtube.com/watch?v=Fbhhc4jtGW4.

The bottom line is that Node.js excels at event-driven I/O throughput. Whether a Node.js program can excel at computational programs depends on your ingenuity in working around some limitations in the JavaScript language.

A big problem with computational programming is that it prevents the event loop from executing. As we will see in the next section, that can make Node.js look like a poor candidate for anything.

Is Node.js a cancerous scalability disaster?

In October 2011, a blog post (since pulled from the blog where it was published) titled Node.js is a cancer called Node.js a scalability disaster. The example shown for proof was a CPU-bound implementation of the Fibonacci sequence algorithm. While the argument was flawed—since nobody implements Fibonacci that way—it made the valid point that Node.js application developers have to consider the following: where do you put the heavy computational tasks?

A key to maintaining high throughput of Node.js applications is by ensuring that events are handled quickly. Because it uses a single execution thread, if that thread is bogged down with a big calculation, Node.js cannot handle events, and event throughput will suffer.

The Fibonacci sequence, serving as a stand-in for heavy computational tasks, quickly becomes computationally expensive to calculate for a naïve implementation such as this:

const fibonacci = exports.fibonacci = function(n) { 
   if (n === 1 || n === 2) {
        return 1;
    } else {
        return fibonacci(n-1) + fibonacci(n-2); 
    }
}

This is a particularly simplistic approach to calculating Fibonacci numbers. Yes, there are many ways to calculate Fibonacci numbers more quickly. We are showing this as a general example of what happens to Node.js when event handlers are slow and not to debate the best ways to calculate mathematical functions. Consider the following server:

const http = require('http'); 
const url  = require('url'); 
 
http.createServer(function (req, res) { 
  const urlP = url.parse(req.url, true); 
  let fibo; 
  res.writeHead(200, {'Content-Type': 'text/plain'}); 
  if (urlP.query['n']) { 
    fibo = fibonacci(urlP.query['n']);  // Blocking
    res.end('Fibonacci '+ urlP.query['n'] +'='+ fibo); 
  } else { 
    res.end('USAGE: http://127.0.0.1:8124?n=## where ## 
        is the Fibonacci number desired'); 
  } 
}).listen(8124, '127.0.0.1'); 
console.log('Server running at http://127.0.0.1:8124');

This is an extension of the simple web server shown earlier. It looks in the request URL for an argument, n, for which to calculate the Fibonacci number. When it's calculated, the result is returned to the caller.

For sufficiently large values of n (for example, 40), the server becomes completely unresponsive because the event loop is not running. Instead, this function has blocked event processing because the event loop cannot dispatch events while the function is grinding through the calculation.

In other words, the Fibonacci function is a stand-in for any blocking operation.

Does this mean that Node.js is a flawed platform? No, it just means that the programmer must take care to identify code with long-running computations and develop solutions. These include rewriting the algorithm to work with the event loop, rewriting the algorithm for efficiency, integrating a native code library, or foisting computationally expensive calculations to a backend server.

A simple rewrite dispatches the computations through the event loop, letting the server continue to handle requests on the event loop. Using callbacks and closures (anonymous functions), we're able to maintain asynchronous I/O and concurrency promises, as shown in the following code:

const fibonacciAsync = function(n, done) { 
    if (n === 0) {
        return 0;
    } else if (n === 1 || n === 2) {
        done(1); 
    } else if (n === 3) {
        return 2;
    } else { 
        process.nextTick(function() { 
            fibonacciAsync(n-1, function(val1) { 
                process.nextTick(function() { 
                    fibonacciAsync(n-2, function(val2) {
                    done(val1+val2); }); 
                }); 
            }); 
        }); 
    } 
}

This is an equally silly way to calculate Fibonacci numbers, but by using process.nextTick, the event loop has an opportunity to execute.

Because this is an asynchronous function that takes a callback function, it necessitates a small refactoring of the server:

const http = require('http'); 
const url  = require('url'); 
 
http.createServer(function (req, res) { 
  let urlP = url.parse(req.url, true);
  res.writeHead(200, {'Content-Type': 'text/plain'}); 
  if (urlP.query['n']) { 
    fibonacciAsync(urlP.query['n'], fibo => {  // Asynchronous
        res.end('Fibonacci '+ urlP.query['n'] +'='+ fibo);
    });
  } else { 
    res.end('USAGE: http://127.0.0.1:8124?n=## where ## is the 
     Fibonacci number desired');
  }
}).listen(8124, '127.0.0.1'); console.log('Server running at http://127.0.0.1:8124');

We've added a callback function to receive the result. In this case, the server is able to handle multiple Fibonacci number requests. But there is still a performance issue because of the inefficient algorithm.

Later in this book, we'll explore this example a little more deeply to explore alternative approaches.

In the meantime, we can discuss why it's important to use efficient software stacks.

Server utilization, overhead costs, and environmental impact

The striving for optimal efficiency (handling more requests per second) is not just about the geeky satisfaction that comes from optimization. There are real business and environmental benefits. Handling more requests per second, as Node.js servers can do, means the difference between buying lots of servers and buying only a few servers. Node.js potentially lets your organization do more with less.

Roughly speaking, the more servers you buy, the greater the monetary cost and the greater the environmental cost. There's a whole field of expertise around reducing costs and the environmental impact of running web-server facilities to which that rough guideline doesn't do justice. The goal is fairly obvious—fewer servers, lower costs, and a lower environmental impact by using more efficient software.

Intel's paper, Increasing Data Center Efficiency with Server Power Measurements (https://www.intel.com/content/dam/doc/white-paper/intel-it-data-center-efficiency-server-power-paper.pdf), gives an objective framework for understanding efficiency and data center costs. There are many factors, such as buildings, cooling systems, and computer system designs. Efficient building design, efficient cooling systems, and efficient computer systems (data center efficiency, data center density, and storage density) can lower costs and environmental impact. But you can destroy these gains by deploying an inefficient software stack, compelling you to buy more servers than you would if you had an efficient software stack. Alternatively, you can amplify gains from data center efficiency with an efficient software stack that lets you decrease the number of servers required.

This talk about efficient software stacks isn't just for altruistic environmental purposes. This is one of those cases where being green can help your business bottom line.

In this section, we have learned a lot about how Node.js architecture differs from other programming platforms. The choice to eschew threads to implement concurrency simplifies away the complexity and overhead that comes from using threads. This seems to have fulfilled the promise of being more efficient. Efficiency has a number of benefits to many aspects of a business.

Node.js Web Development - Fifth Edition

By : David Herron

Node.js Web Development - Fifth Edition

By: David Herron

Overview of this book

Related Content you might be interested in

Current Title:

Node.js Web Development - Fifth Edition

Node Cookbook

The Node.js event-driven architecture

The Node.js answer to complexity

Asynchronous requests in Node.js

Performance and utilization

Is Node.js a cancerous scalability disaster?

Server utilization, overhead costs, and environmental impact