Year round learning for product, design and engineering professionals

Appcache, not so much a douchebag as a complete pain in the #$%^

A little while back, Jake Archibald wrote infamously (and anthropomorphically) that the HTML5 ApplicationCache is a “douchebag”[1]. Mindful that this is a word freighted with troubling significance, it is the term he used, so I’ll go with it.

The Urban Dictionary says the word douchebag

generally refers to a male with a certain combination of obnoxious characteristics related to attitude, social ineptitude, public behavior, or outward presentation.

Though the common douchebag thinks he is accepted by the people around him, most of his peers dislike him. He has an inflated sense of self-worth, compounded by a lack of social grace and self-awareness. He behaves inappropriately in public, yet is completely ignorant to how pathetic he appears to others.

I think this is a bit harsh for poor AppCache. To stay with the metapohor, Appcache is doing his best, it’s just that he does exactly what you ask him to. Even when no one could possibly mean what you said! In this way, AppCache reminds me of one of my favourite ever comics, Mr Logic, from Viz Magazine (probably NSFW due to its extreme puerility).

Mr. Logic’s defining characteristic is that he takes everything literally. And as a consequence, he is “a complete pain in the #$%^”. He doesn’t mean to annoy you, he just doesn’t understand nuance, and has no “common sense” (which as my nanna always loved to say is sadly far from common).

Mr Logic

This is AppCache to a tee. Ask him to cache the manifest file for your site, so that your site is now preserved in digital amber, never to be updated again, no problem. Why would anyone ever want to do this? Who knows, but he’ll do it for you.
Removed the manifest attribute from a cached HTML document? Well, AppCache doesn’t check changed documents until you change the manifest file, so ’til then, the old cached version of the HTML file, with its link to the manifest still in place will be used. All very logical. But in many ways counter-intuitive.

I guess what I’m saying, is the fault, dear reader, lies not in AppCache, but in ourselves. Actually, the fault really lies in the rules that have been taught to AppCache. Some of these are just downright infuriating. And perhaps the most infuriating of these is the following.

It starts with the following entirely logical, but deeply unintuitive way in which caching works.

  • A user visits say webdirections.org, and the browser builds an applicationCache using the cache manifest, caching the index.html file, images, CSS files, and JavaScript files
  • Subsequently, we change some of the HTML and CSS at webdirections.org, and update a changed manifest file
  • The user returns to webdirections.org, and their browser immediately uses the cached resources from the previous visit to display the page.
  • The browser only then checks the manifest file to see if it has changed, and as it has, the browser then downloads the changed resources.

It makes perfect sense! We get the cached version immediately, leaving out any network traffic. Super fast page load FTW. Stale page load not so good.

But, you say, why don’t we not cache the HTML file, but cache all the rest.
Well. AppCache has a concept of “master entries”. A master entry is an HTML file that includes a manifest attribute in the html element that points to a manifest file (which is the only way to create an HTML5 appcache BTW). Any such HTML file is automatically added to the cache. This makes sense a lot of the time, but not always. In particular, when an HTML document changes frequently, we won’t want it cached (as a stale version of the page will most likely be served to the user as we just saw).

Is there no way to over-ride this? Well, AppCache has the idea of a NETWORK whitelist, which instructs the appcache to always use the online version of a file. What if we add HTML files we don’t want cached to this? Sorry, no dice. HTML files in a master entry stay cached, even when included in the NETWORK whitelist. See what I mean. Poor AppCache didn’t make these rules. He’s just following them literally. He’s not a douchebag, he’s a pain in the %^&*, a total “jobs-worth”.

So we are be stuck. We seem to be able to

  • either add a manifest attribute to the html element of the document, and have the page cached too
  • or have no appcaching at all for that page.

Where we have the front page of a site that changes frequently, we either have the situation that the page will likely be out of date for users who return (because the most recently cached version will always be used), but we get the ability to cache images, CSS, JavaScript and other resources which don’t change frequently. Or we can’t cache those resources at all.

Updated

The following needs updating, because it is in fact sadly wrong. While no-store foes have an influence on caching, in the case of master entries, rather than simply not caching a master entry, but caching all the other resources in the manifest, we get an error and no resources are cached at all. So, I’m going to turn my technique into a proposal for how AppCache could be made a little less painful. What I suggest below I think is what should happen when the appCache encounters a master entry served with Cache-control: no-store

Original

But, there is a (little known) solution to this. It’s in the HTML5 specification, but currently, it’s only supported in Internet Explorer (10+, the first version to support AppCache) and Firefox. Hopefully other browsers and devices will start supporting it, because it’s a game changer when it comes to AppCache I think.

You probably know that (but may not know the details of how) browsers have long used HTTP response headers to decide on how to cache content. The server can also send instruction about whether a resource is cacheable or not.

In a nutshell, when a browser requests a resource, the server sends both the content of the resource (for example, a HTML document), and a response header. One of the fields of a response header is Cache-control, which can contain a number of directives, including no-cache, and no-store.

  • no-cache doesn’t in fact instruct the browser not to cache the resource, it instructs the browser to always check with the server before using a cached version of the resource (see, it’s not just AppCache who can be a pain)
  • no-store means don’t cache the resource, and always use the online version.

If you’re familiar with AppCache, no-store is the equivalent to the NETWORK section of a cache manifest. Now, how do HTTP headers and AppCache work together? What the HTML5 AppCache spec says about HTTP headers is that they should be ignored for the purposes of AppCache, except no-store. Which means (in theory), we can send an HTML file with the directive Cache-control: no-store, and it won’t be cached in the AppCache! Could it be we have a solution to what has been one of AppCaches most infuriating “features”.

With bated breath I created a test case. On Safari and Chrome, no luck. The HTML file served with no-store is still added to the AppCache. But, with Firefox, and IE10, like Daft Punk, we got lucky. These browsers honour no-store. If I change the HTML document, the next time the page loads, all the cached resources are used from the cache, but the new HTML page displays.

So close, and yet so far, I hear you thinking. Because it’s not supported across all browsers yet, what good does it do me? Here we’ll have to dive a little bit more into AppCache. In browsers that support AppCache, there’s a new property, applicationCache, of the window object. This receives various events, including updateReady when the cache has been changed and is now ready to be used. So, we can update a cached master entry as follows.

  1. add an event listener for updateReady
  2. this calls applicationCache.swapCache, which swaps the now stale cache for the fresh one
  3. our event handler now calls window.reload(true) to force a refresh of the page

What’s great about our no-store trick is, the cache doesn’t need updating, so in browsers which support it, updateReady doesn’t fire! So we have a bullet-proof way of making sure frequently changing HTML pages aren’t added as master entries to the appcache in browsers which honour no-store, as well as auto-refreshing these pages to ensure the browser uses the most up-to-date version in browsers which don’t (yet) honour the no-store directive.

Which hopefully makes the AppCache just a little bit less of a pain in the #$%^ to deal with!

Again note, sadly this is not what actually happens

Here’s what does happen when a master entry is served as no-store

  • Chrome and Safari ignore no-store, and build the cache including the master entry
  • Firefox from what I can tell silently fails, and doesn’t build a cache at all
  • Internet Explorer (10) fires an error, and doesn’t build a cache.

Moral of the story: don’t serve HTML with no-store if you want appcache to work!

My upcoming book on HTML5 Offline

I discovered all this, and much more, while researching my upcoming book on HTML5 offline capabilities, not just appcache, but localStorage, the File API, offline events and even HTTP Caching. It’s coming soon, so why not sign up to our newsletter to be the first to hear about it, or follow me on twitter (or better still both!).

Want to learn more about AppCache in the meantime? Here’s an article I wrote a couple of years ago, and this presentation I did at Web Directions Code last year.

Technologies mentioned in this post

People mentioned in this post

References

delivering year round learning for front end and full stack professionals

Learn more about us

Web Directions South is the must-attend event of the year for anyone serious about web development

Phil Whitehouse General Manager, DT Sydney