- Amir Boroumand | Software engineer based in Pittsburgh, PA/
- blog/
- The Developer's Guide to Browser Caching/
The Developer's Guide to Browser Caching
Note
This article was written over 5 years ago. Some information may be outdated or irrelevant.
Overview #
Caching is a useful yet surprisingly complex feature of web browsers.
In this article, we’ll explain the how the browser uses its cache to load pages faster, which factors determine cache duration, and how we can bypass the cache when necessary.
Why is Caching Important? #
All browsers attempt to keep local copies of static assets in an effort to reduce page load times and minimize network traffic.
Fetching a resource over a network will always be slower than retrieving it from local cache. This is true whether the server is on the same network or it’s located on the far side of the world.
How Browser Caching Works #
Case 1: User has not visited the site before #
The browser won’t have any files cached for the site so it will fetch everything from the server.
Below is a snapshot of the resources downloaded when visiting the Wikipedia home page for the first time. The status bar at the bottom shows that 265KB of data was transferred to the browser.
Case 2: User has visited the site before #
The browser will retrieve the HTML page from the web server but consult its cache for the static assets (JavaScript, CSS, images).
We can see the difference cache makes when we refresh the Wikipedia page:
The data transferred went down to 928 bytes - that’s 0.3% the size of the initial page load. The Size column shows us that most of the content is pulled from cache.
Note
Chrome will pull files from either memory cache or disk cache. Since we didn’t close our browser between Cases 1 & 2, the data was still in memory cache.
Viewing the browser cache #
In Chrome, we can go to chrome://cache
to view the contents of the cache. This will display a page of links to a detailed view for each cached file.
How does the browser know what to cache? #
The browser inspects the headers of the HTTP response generated by the web server. There are four headers commonly used for caching:
ETag
Cache-Control
Expires
Last-Modified
ETag #
The ETag (or Entity Tag) is a string that serves as a cache validation token. This is usually a hash of the file contents.
The server can include an ETag in its response, which the browser can then use this in a future request (after the file has expired) to determine if the cache contains a stale copy.
If the hash is the same, then the resource hasn’t changed and the server responds with a 304 Not Modified
response with an empty body. This lets the browser know it’s still safe to use the cached copy.
Note that ETag is only used in requests whenever the file has expired from cache.
Cache-Control #
The Cache-Control header has a number of directives we can set to control cache behavior, expiration, and validation.
Cache behavior #
public
- the resource can be cached by any cache (browser, CDN, etc)
private
- the resource can only be cached by the browser
no-store
- always request the resource from the server
no-cache
This one is actually a bit misleading. It doesn’t mean “do not cache”.
This tells the browser to cache the file but not to use it until it checks with the server to validate we have the latest version. This validation is done with the ETag header.
This is commonly used with HTML files since it makes sense for the browser to always check for the latest markup.
Expiration #
max-age=<integer>
This specifies the length of time in seconds the resource should be cached. So a max-age=60
means that it should be cached for 1 minute. RFC 2616 recommends that the maximum value for should no longer than 1 year (max-age=31536000).
s-max-age=<integer>
This is only used by intermediate caches like a CDN.
Validation #
must-revalidate
This tells the cache it must verify the status of the stale resource before using it and expired ones should not be used.
Expires #
The Expires
header is from the older HTTP 1.0 days but is still used on many sites.
This header field provides an expiration date after which the asset is considered invalid.
Expires: Wed, 25 Jul 2018 21:00:00 GMT
The browser will ignore this field if there’s a max-age
directive in Cache-Control.
Last-Modified #
The Last-Modified
header is also from the HTTP 1.0 days.
Last-Modified: Mon, 12 Dec 2016 14:45:00 GMT
This field contains the date and time the resource was last modified.
HTML Meta Tags #
Prior to HTML5, using meta tags inside HTML to specify cache-control was a valid approach:
<meta http-equiv="Cache-control" content="no-cache">
Using a meta tag like this is now discouraged and is not valid HTML5. Why? It’s not a good idea because only browsers will be able to parse the meta tag and understand it. Intermediate caches won’t.
So always send caching instructions via HTTP headers.
HTTP Response #
Let’s take a look at an sample HTTP response:
|
|
- Line 2 tells us that the
max-age
is 1 hour - Line 5 tells us that this is a PNG image
- Line 7 shows us the
ETag
value which will be used for validation after the 1 hour mark to verify that the resource hasn’t changed - Line 8 is the
Expires
header which will be ignored sincemax-age
is set - Line 10 is the
Last-Modified
header which shows when the image was last modified
Caching Pitfalls #
So we’ve established that browser caching is awesome, and we should take advantage of it.
But we also want users see the latest version of our page when we make updates. We can’t expect them to do a hard refresh (Ctrl-F5) every time they visit our site or clear their cache regularly.
These types of caching issues are often a source of frustration for both the developer and end-user. A user may see a broken page or a button that behaves strangely because they have an outdated stylesheet or JavaScript code.
Stale Files #
Below is a Twitter exchange between Chase Support and a user having issues with a login form on the banking site. The user likely had some old JavaScript cached in their browser which caused the form to reset instead of submit when the Logon button was clicked.
Let’s explore another situation where stale files could bite us.
Suppose we fix a bug in a JavaScript file called app.min.js and push the update to our production site.
This is what our HTML looks like:
<script src="assets/js/app.min.js">
Our web server sets the max-age
of JavaScript files to 1 week (604,800 seconds).
Cache-Control: private, max-age=604800
After the update, some users report they are still having issues symptomatic of the bug.
What’s going on here?
- Bob visited the site 2 weeks ago and has a cached copy of buggy app.min.js. Since his copy is older than max-age, the browser will retrieve the file from the server, and he gets the latest bug-free version.
- Mary visited the site 2 days ago and also has a cached copy of buggy app.min.js. Her copy is newer than max-age so her browser is still happily using the cached copy.
In the next section, we’ll see how to prevent these issues with a technique called cache busting.
Cache Busting #
Cache busting is where we invalidate a cached file and force the browser to retrieve the file from the server.
We can instruct the browser to bypass the cache by simply changing the filename. To the browser, this is a completely new resource so it will fetch the resource from the server.
Cache busting also allows us to keep long max-age
values for resources that may change frequently. Google recommends that max-age
be set to 1 year
source.
Versioning #
We could add a version number to the filename:
assets/js/app-v2.min.js
Fingerprinting #
We could add a fingerprint based on the file contents:
assets/js/app-d41d8cd98f00b204e9800998ecf8427e.min.js
Append a query string #
We could append a query string to the end of the filename:
assets/js/app.min.js?version=2
The query string approach has known issues with proxy servers so this method is generally discouraged.
Best Practices #
Do #
- Use the
Cache-Control
andETag
headers to control cache behavior for static assets - Set long
max-age
values to reap the benefits of browser cache - Use fingerprinting or versioning for cache busting
Don’t #
- Use HTML meta tags to specify cache behavior
- Use query strings for cache busting
FAQ #
How can I tell if a file was loaded from cache? #
Check out the Developer Tools in your browser. In Chrome, this information is shown in the Network tab under the Size column.
How do I prevent caching for a file? #
Use the following response header:
Cache-Control: no-cache, no-store, must-revalidate