Book Image

Node Cookbook

By : David Mark Clements
Book Image

Node Cookbook

By: David Mark Clements

Overview of this book

The principles of asynchronous event-driven programming are perfect for today's web, where efficient real-time applications and scalability are at the forefront. Server-side JavaScript has been here since the 90's but Node got it right. With a thriving community and interest from Internet giants, it could be the PHP of tomorrow. "Node Cookbook" shows you how to transfer your JavaScript skills to server side programming. With simple examples and supporting code, "Node Cookbook" talks you through various server side scenarios often saving you time, effort, and trouble by demonstrating best practices and showing you how to avoid security faux pas. Beginning with making your own web server, the practical recipes in this cookbook are designed to smoothly progress you to making full web applications, command line applications, and Node modules. Node Cookbook takes you through interfacing with various database backends such as MySQL, MongoDB and Redis, working with web sockets, and interfacing with network protocols, such as SMTP. Additionally, there are recipes on correctly performing heavy computations, security implementations, writing, your own Node modules and different ways to take your apps live.
Table of Contents (16 chapters)
Node Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface

Optimizing performance with streaming


Caching content certainly improves upon reading a file from disk for every request. However, with fs.readFile, we are reading the whole file into memory before sending it out in response. For better performance, we can stream a file from disk and pipe it directly to the response object, sending data straight to the network socket one piece at a time.

Getting ready

We are building on our code from the last example, so let's get server.js, index.html, styles.css, and script.js ready.

How to do it...

We will be using fs.createReadStream to initialize a stream, which can be piped to the response object. In this case, implementing fs.createReadStream within our cacheAndDeliver function isn't ideal because the event listeners of fs.createReadStream will need to interface with the request and response objects. For the sake of simplicity, these would preferably be dealt within the http.createServer callback. For brevity's sake, we will discard our cacheAndDeliver function and implement basic caching within the server callback:

//requires, mime types, createServer, lookup and f vars...
fs.exists(f, function (exists) {
if (exists) {
var headers = {'Content-type': mimeTypes[path.extname(f)]};
if (cache[f]) {
response.writeHead(200, headers);
response.end(cache[f].content);
return;
} //...rest of server code...

Later on, we will fill cache[f].content while we're interfacing with the readStream object. Here's how we use fs.createReadStream:

var s = fs.createReadStream(f);

This will return a readStream object which streams the file that is pointed at by the f variable. readStream emits events that we need to listen to. We can listen with addEventListener or use the shorthand on:

var s = fs.createReadStream(f).on('open', function () {
//do stuff when the readStream opens
});

Since createReadStream returns the readStream object, we can latch our event listener straight onto it using method chaining with the dot notation. Each stream is only going to open once, we don't need to keep on listening to it. Therefore, we can use the once method instead of on to automatically stop listening after the first event occurrence:

var s = fs.createReadStream(f).once('open', function () {
//do stuff when the readStream opens
});

Before we fill out the open event callback, let's implement error handling as follows:

var s = fs.createReadStream(f).once('open', function () {
//do stuff when the readStream opens
}).once('error', function (e) {
console.log(e);
response.writeHead(500);
response.end('Server Error!');
});

The key to this entire endeavor is the stream.pipe method. This is what enables us to take our file straight from disk and stream it directly to the network socket via our response object.

var s = fs.createReadStream(f).once('open', function () {
response.writeHead(200, headers);
this.pipe(response);
}).once('error', function (e) {
console.log(e);
response.writeHead(500);
response.end('Server Error!');
});

What about ending the response? Conveniently, stream.pipe detects when the stream has ended and calls response.end for us. For caching purposes, there's one other event we need to listen to. Still within our fs.exists callback, underneath the createReadStream code block, we write the following code:

fs.stat(f, function(err, stats) {
var bufferOffset = 0;
cache[f] = {content: new Buffer(stats.size)};
s.on('data', function (chunk) {
chunk.copy(cache[f].content, bufferOffset);
bufferOffset += chunk.length;
});
});

We've used the data event to capture the buffer as it's being streamed, and copied it into a buffer that we supplied to cache[f].content, using fs.stat to obtain the file size for the file's cache buffer.

How it works...

Instead of the client waiting for the server to load the entire file from the disk prior to sending it to the client, we use a stream to load the file in small, ordered pieces and promptly send them to the client. With larger files this is especially useful, as there is minimal delay between the file being requested and the client starting to receive the file.

We did this by using fs.createReadStream to start streaming our file from the disk. fs.createReadStream creates readStream, which inherits from the EventEmitter class.

The EventEmitter class accomplishes the evented part of Node's tag line: Evented I/O for V8 JavaScript. Due to this, we'll use listeners instead of callbacks to control the flow of stream logic.

Then we added an open event listener using the once method since we want to stop listening for open once it has been triggered. We respond to the open event by writing the headers and using the stream.pipe method to shuffle the incoming data straight to the client.

stream.pipe handles the data flow. If the client becomes overwhelmed with processing, it sends a signal to the server which should be honored by pausing the stream. Under the hood, stream.pipe uses stream.pause and stream.resume to manage this interplay.

While the response is being piped to the client, the content cache is simultaneously being filled. To achieve this, we had to create an instance of the Buffer class for our cache[f].content property. A Buffer must be supplied with a size (or an array or string) which in our case is the size of the file. To get the size, we used the asynchronous fs.stat and captured the size property in the callback. The data event returns Buffer as its only callback parameter.

The default bufferSize for a stream is 64 KB. Any file whose size is less than the bufferSize will only trigger one data event because the entire file will fit into the first chunk of data. However, for files greater than bufferSize, we have to fill our cache[f].content property one piece at a time.

Note

Changing the default readStream buffer size:

We can change the buffer size of readStream by passing an options object with a bufferSize property as the second parameter of fs.createReadStream.

For instance, to double the buffer you could use fs.createReadStream(f,{bufferSize: 128 * 1024});

We cannot simply concatenate each chunk with cache[f].content since this will coerce binary data into string format which, though no longer in binary format, will later be interpreted as binary. Instead, we have to copy all the little binary buffer chunks into our binary cache[f].content buffer.

We created a bufferOffset variable to assist us with this. Each time we add another chunk to our cache[f].content buffer, we update our new bufferOffset by adding the length of the chunk buffer to it. When we call the Buffer.copy method on the chunk buffer, we pass bufferOffset as the second parameter so our cache[f].content buffer is filled correctly.

Moreover, operating with the Buffer class renders performance enhancements with larger files because it bypasses the V8 garbage collection methods. These tend to fragment large amounts of data thus slowing down Node's ability to process them.

There's more...

While streaming has solved a problem of waiting for files to load into memory before delivering them, we are nevertheless still loading files into memory via our cache object. With larger files, or large amounts of files, this could have potential ramifications.

Protecting against process memory overruns

There is a limited amount of process memory. By default, V8's memory is set to 1400 MB on 64-bit systems and 700 MB on 32-bit systems. This can be altered by running Node with --max-old-space-size=N where N is the amount of megabytes (the actual maximum amount that it can be set to depends upon the OS and of course the amount of physical RAM available). If we absolutely needed to be memory intensive, we could run our server on a large cloud platform, divide up the logic, and start new instances of node using the child_process class.

In this case, high memory usage isn't necessarily required and we can optimize our code to significantly reduce the potential for memory overruns. There is less benefit to caching larger files. The slight speed improvement relative to the total download time is negligible while the cost of caching them is quite significant in ratio to our available process memory. We can also improve cache efficiency by implementing an expiration time on cache objects which can then be used to clean the cache, consequently removing files in low demand and prioritizing high-demand files for faster delivery. Let's rearrange our cache object slightly:

var cache = {
store: {},
maxSize : 26214400, //(bytes) 25mb
}

For a clearer mental model, we're making a distinction between the cache as a functioning entity and the cache as a store (which is a part of the broader cache entity). Our first goal is to only cache files under a certain size. We've defined cache.maxSize for this purpose. All we have to do now is insert an if condition within the fs.stat callback:

fs.stat(f, function (err, stats) {
if (stats.size < cache.maxSize) {

var bufferOffset = 0;
cache.store[f] = {content: new Buffer(stats.size),
timestamp: Date.now() };
s.on('data', function (data) {
data.copy(cache.store[f].content, bufferOffset);
bufferOffset += data.length;
});
}

});

Notice we also slipped in a new timestamp property into our cache.store[f]. This is for cleaning the cache, which is our second goal. Let's extend cache:

var cache = {
store: {},
maxSize: 26214400, //(bytes) 25mb
maxAge: 5400 * 1000, //(ms) 1 and a half hours
clean: function(now) {
var that = this;
Object.keys(this.store).forEach(function (file) {
if (now > that.store[file].timestamp + that.maxAge) {
delete that.store[file];
}
});
}

};

So in addition to maxSize, we've created a maxAge property and added a clean method. We call cache.clean at the bottom of the server like so:

//all of our code prior
cache.clean(Date.now());

}).listen(8080); //end of the http.createServer

cache.clean loops through cache.store and checks to see if it has exceeded its specified lifetime. If it has, we remove it from store. We'll add one further improvement and then we're done. cache.clean is called on each request. This means cache.store is going to be looped through on every server hit, which is neither necessary nor efficient. It would be better if we cleaned the cache, say, every two hours or so. We'll add two more properties to cache. The first is cleanAfter to specify how long between cache cleans. The second is cleanedAt to determine how long it has been since the cache was last cleaned.

var cache = {
store: {},
maxSize: 26214400, //(bytes) 25mb
maxAge : 5400 * 1000, //(ms) 1 and a half hours
cleanAfter: 7200 * 1000,//(ms) two hours
cleanedAt: 0, //to be set dynamically

clean: function (now) {
if (now - this.cleanAfter > this.cleanedAt) {

this.cleanedAt = now;
that = this;
Object.keys(this.store).forEach(function (file) {
if (now > that.store[file].timestamp + that.maxAge) {
delete that.store[file];
}
});
}
}
};

We wrap our cache.clean method in an if statement which will allow a loop through cache.store only if it has been longer than two hours (or whatever cleanAfter is set to), since the last clean.

See also

  • Handling file uploads discussed In Chapter 2, Exploring the HTTP Object

  • Securing Against Filesystem Hacking Exploits discussed in this chapter.