Book Image

PhantomJS Cookbook

By : Rob Friesel
Book Image

PhantomJS Cookbook

By: Rob Friesel

Overview of this book

Table of Contents (15 chapters)
PhantomJS Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Running PhantomJS with a disk cache


In this recipe, we will learn about running PhantomJS with an on-disk cache that is enabled using the disk-cache and max-disk-cache-size command-line arguments. We can use this to test how browsers cache our static assets.

Getting ready

To run this recipe, we will need a script to run with PhantomJS that accesses a website with cacheable assets. Optionally, we will also need a sense of how large we wish to set the on-disk cache (in kilobytes).

The script in this recipe is available in the downloadable code repository as recipe06.js under chapter01. If we run the provided example script, we must change to the root directory for the book's sample code.

Lastly, the script in this recipe runs against the demo site that is included with the cookbook's sample code repository. To run that demo site, we must have Node.js installed. In a separate terminal, change into the phantomjs-sandbox directory (in the sample code's directory) and start the app with the following command:

node app.js

How to do it…

Given the following script:

var page  = require('webpage').create(),
    count = 0,
    until = 2;

page.onResourceReceived = function(res) {
  if (res.stage === 'end') {
    console.log(JSON.stringify(res, undefined, 2));
  }
};

page.onLoadStarted = function() {
  count += 1;
  console.log('Run ' + count + ' of ' + until + '.');
};

page.onLoadFinished = function(status) {
  if (status === 'success') {
    if (count < until) {
      console.log('Go again.\n');
      page.reload();
    } else {
      console.log('All done.');
      phantom.exit();
    }
  } else {
    console.error('Could not open page! (Is it running?)');
    phantom.exit(1);
  }
};

page.open('http://localhost:3000/cache-demo');

Enter the following command at the command line:

phantomjs --disk-cache=true --max-disk-cache-size=4000 chapter01/recipe06.js

The script will print out details about each resource in the response as JSON.

How it works…

Our preceding example script performs the following actions:

  1. It creates a webpage object and sets two variables, count and until.

  2. We assign an event handler function to the webpage object's onResourceReceived callback. This callback will print out every property of each resource received.

  3. We assign an event handler function to the webpage object's onLoadStarted callback. This callback will increment count when the page load starts and print a message indicating which run it is.

  4. We assign an event handler function to the webpage object's onLoadFinished callback. This callback checks the status of the response and takes action accordingly as follows:

    • If status is not 'success', then we print an error message and exit from PhantomJS

    • If the callback's status is 'success', then we check to see if count is less than until, and if it is, then we call reload on the webpage object; otherwise, we exit PhantomJS

  5. Finally, we open the target URL (http://localhost:3000/cache-demo) using webpage.open.

There's more…

Even though the disk cache is off by default, PhantomJS still performs some in-memory caching. This detail becomes important in later explorations, as it produces some otherwise difficult to explain results. For example, in our preceding sample script, we used webpage.reload for our second request of the URL, and in that second request, we saw all of the images re-requested. However, if we had used a second call to webpage.open (instead of webpage.reload), then the onResourceReceived callback would have shown a second request to the URL but none of the images would have been re-requested. (As an interesting aside, we would also see that behavior if we set the disk-cache argument to false; the in-memory cache cannot be disabled.)

Another interesting observation is that PhantomJS always reports an HTTP response status of 200 Ok for every successfully retrieved asset. If we look at the Node.js console output for the demo app while our sample script runs, we can see the discrepancy. Again, when our sample script runs, we can see that an HTTP status code of 200 is reported by PhantomJS for each of the images during both the first and second request/response cycles. However, the output from the Node.js app looks something like this:

GET /cache-demo 200 1ms - 573b
GET /images/583519989_1116956980_b.jpg 200 4ms - 264.64kb
GET /images/152824439_ffcc1b2aa4_b.jpg 200 8ms - 615.21kb
GET /images/357292530_f225d7e306_b.jpg 200 6ms - 497.98kb
GET /images/391560246_f2ac936f6d_b.jpg 200 5ms - 446.68kb
GET /images/872027465_2519a358b9_b.jpg 200 5ms - 766.94kb
GET /cache-demo 200 1ms - 573b
GET /images/152824439_ffcc1b2aa4_b.jpg 304 3ms
GET /images/357292530_f225d7e306_b.jpg 304 3ms
GET /images/391560246_f2ac936f6d_b.jpg 304 2ms
GET /images/583519989_1116956980_b.jpg 304 3ms
GET /images/872027465_2519a358b9_b.jpg 304 3ms

We can see that the server responds with 304 Not Modified for each of the image assets. This is exactly what we would expect for a second request to the same URL when the assets are served with Cache-Control headers that specify a max-age, and for assets that are also cached to disk.

disk-cache

We can enable the disk cache by setting the disk-cache argument to true or yes. By default, the disk cache is disabled, but we can also explicitly disable it by providing false or no to the command-line argument. When the disk cache is enabled, PhantomJS will cache assets to the on-disk cache, which it stores at the desktop services cache storage location. Caching these assets has the potential to speed up future script runs against URLs that share those assets.

max-disk-cache-size

Optionally, we may also wish to limit the size of the disk cache (for example, to simulate the small caches on some mobile devices). To limit the size of the disk cache, we use the max-disk-cache-size command-line argument and provide an integer that determines the size of the cache in kilobytes. By default (if you do not use the max-disk-cache-size argument), the cache size is unbounded. Most of the time, we will not need to use the max-disk-cache-size argument.

Cache locations

If we need to inspect the cached data that is persisted to disk, PhantomJS writes to the desktop services cache storage location for the platform it's running on. These locations are listed as follows:

Platform

Location

Windows

%AppData%/Local/Ofi Labs/PhantomJS/cache/http

Mac OS X

~/Library/Caches/Ofi Labs/PhantomJS/data7

Linux

~/.qws/cache/Ofi Labs/PhantomJS

Note

These locations may not exist until after we have run PhantomJS with the disk-cache argument enabled.

See also

  • The Opening a URL within PhantomJS recipe in Chapter 3, Working with webpage Objects