I promised an article on how we do cache busting a while ago, and never got around to writing it... until today. As promised here it is.
Normally, for optimal cache performance, you want to minimize browsers' checking for freshness as much as possible. So, to do this, you need to configure your server to send the so-called 'far-future expiration date'. This assures the browser that the assets it has just downloaded will be fresh for very long periods of time (say, a year?). However, at the same time, you want to 'bust' the cache when the assets are updated.
Since, with far-future expiration date on assets, browser will not even bother to check if there are fresh copies on the server, there is no way of notifying the browser of the new versions. After all, if browser doesn't talk to you, how can you let it know about anything at all? In order to get around this, developers have invented many techniques, some effective, some less so. Without bashing at other techniques, I'll just explain what we do at Herd Hound and why it works.
First thing first, let's go over the headers that are sent from nginx.
For all JS, CSS, and image assets, the following headers are sent:
expires modified +400d;
add_header Cache-Control public;
The first line tells nginx to set the expiration date for the asset to file modification date + 400 days. The second line tells nginx to set 'Cache-Control' header to 'public'. You can read more about the 'Cache-Control' header in RFC2616 secion 14.9, but in this case, it boils down to letting any caching proxy server between user agent and server know that caching the content is allowed.
Before I go on to busting the cache, a few words about our particular setup. The entire site, which is a single-page AJAX app, is comprised of a multitude of JavaScript and CSS modules. At build-time, these modules are compiled down to just two files: main.css, and boot.js. This is taken care of by RequireJS's optimizer tool.
One approach I've considered is renaming all asset files and relinking them. I wasn't happy about the relinking part, so I decided to do it a bit differently.
The cache-busting script first goes over all assets in each of the directories, and creates a SHA1 hexdigest of all the file contents. It then concatenates the hashes into a single string, and creates a truncated SHA1 hexdigest of the string. It then appends the truncated hexdigest to the master file name (boot.js and main.css). The end result is that you end up with modified file names which look like boot_a4150c.js. Browser will consider it as a completely new asset, and it will download it, and thus the cache is busted.
Technically, the technique is not really cache busting, because the older versions of the assets are still cached, as they are considered to be separate assets, not old versions of the same asset. If you are a purist, or you are concerned about users' storage capacity, this might not be your cup of tea (coffee?).
What I like about this method of cache-busting is that it doesn't use URL parameters, which causes some proxies to not cache the files at all, as URL parameters are usually used for dynamic content which are almost universally not very useful when stale. Secondly, I like that it doesn't use modification timestamps of the files, since those can change even if the contents of the files have not actually changed at all (e.g., when you save a file without actually modifying it).
For image assets, individual hexdigests are appended to file names, since they do not compile into a single image or anything like that.
If you want to see how it all works together, here's the script we use:
/**
* Cache-buster
*
* @author Monwara LLC / Branko Vukelic
* @version 0.0.1
*/
var fs = require('fs');
var crypto = require('crypto');
function walk(pattern, dir) {
pattern = pattern || /.*/;
dir = dir || '.';
var returnFiles = [];
var files = fs.readdirSync(dir);
files.forEach(function(f) {
var path = dir + '/' + f;
var stat = fs.statSync(path);
if (pattern.test(f) && stat.isFile()) {
returnFiles.push(path);
} else if (stat.isDirectory()) {
walk(pattern, path).forEach(function(wf) {
returnFiles.push(wf);
});
}
});
return returnFiles;
}
function sha1trunc(c) {
var shasum = crypto.createHash('sha1');
return shasum.update(c).digest('hex').slice(0,6);
}
function basename(path) {
return (/(.+)(?=\.\w+$)/).exec(path)[0];
}
function extension(path) {
return (/\.[a-zA-Z0-9]{3,4}$/).exec(path)[0];
}
function sha1all(pattern, dir) {
return sha1trunc(walk(pattern, dir).map(function(f) {
return sha1trunc(fs.readFileSync(f));
}).join(''));
}
var images = walk(/\.png$/, 'img');
images.forEach(function(ifile) {
var file = fs.readFileSync(ifile);
var checksum = sha1trunc(file);
var newFilename = basename(ifile) + '_' + checksum + extension(ifile);
console.log(ifile + ' => ' + newFilename);
fs.renameSync(ifile, newFilename);
function renameUrl(file) {
var fileContents = fs.readFileSync(file).toString();
var updatedContents = fileContents.replace(ifile, newFilename);
if (fileContents !== updatedContents) {
console.log('Updating ' + ifile + ' to ' + newFilename + ' in ' + file);
fs.writeFileSync(file, updatedContents);
}
}
setTimeout(function() {
walk(/\.css$/, 'css').forEach(renameUrl);
}, 1);
setTimeout(function() {
walk(/\.tpl$/, 'js/templates').forEach(renameUrl);
}, 1);
});
var cssChecksum = sha1all(/\.css$/, 'css');
var jsChecksum = sha1all(/\.js$/, 'js');
console.log('main.css => main_' + cssChecksum + '.css');
fs.renameSync('css/main.css', 'css/main_' + cssChecksum + '.css');
console.log('boot.js => boot_' + jsChecksum + '.js');
fs.renameSync('js/boot.js', 'js/boot_' + jsChecksum + '.js');
var index = fs.readFileSync('index.html').toString();
index = index.
replace('main.css', 'main_' + cssChecksum + '.css').
replace('boot', 'boot_' + jsChecksum);
fs.writeFileSync('index.html', index);
var build = fs.readFileSync('app.build').toString();
build = build.replace(/boot/g, 'boot_' + jsChecksum);
fs.writeFileSync('app.build', build);
Feel free to use this script for your projects.
Incidentally, we run this script before it is run thorough the RequireJS optimizer.
No comments:
Post a Comment