Palisade Magazine

 
July 2008

Cache Control Directives Demystified

by Siddharth Anbalahan |  Discuss this article »» (2)
cache-control-attributes.jpg

Many years ago, HTTP 1.1 introduced specialized Cache Control directives to control the behavior of browser caches and proxy caches. These were a refinement over the HTTP 1.0 headers that programmers were using to control the behavior of caches. Though these directives are several years old, we still see them being used incorrectly. In this article, we explain the meaning and relevance of the most important cache control directives.

Pragma: No-cache

This is a HTTP 1.0 directive that was retained in HTTP 1.1 for backward compatibility. When specified in HTTP requests, this directive instructs proxies in the path not to cache the request. This is useful when you are submitting sensitive details like usernames and passwords in the request.

Notice that “pragma: no-cache” is linked to requests, and not responses. The RFC did not specify the behavior of this directive for responses. Hence, this directive does NOT instruct the browser not to cache a page. We often see this tag being misused when a page is served to the browser. Developers mistakenly set this directive expecting that the page will not be cached on the browser.

For all practical purposes, you can ignore this directive today. HTTP 1.1 introduced better directives, as we’ll see shortly.

Expires header

This is yet another HTTP 1.0 directive that was retained for backward compatibility. This directive tells the browser when a page is set to expire. Once the page expires, the browser does not display the page to the user. Instead it shows a message like “Warning: Page has expired”.

In earlier days, developers played a nifty trick with this directive to ensure that a page expires immediately and is not served from the cache: they would set the expiration date to a day in the distant past. Thus, the browser treats the page has expired and never displays it from cache.

Though the page is not served from cache, browsers still used to store the page in the cache. You could navigate to the cache folder of the browser and open the file directly from there. Thus, this was not really a secure directive.

Cache-Control: public

HTTP 1.1 introduced an array of cache control directives. These give greater flexibility and control to the developer. The “cache-control: public” directive is the most basic directive and tells the browser and proxies in the path that the page may be cached. This is good for non-sensitive pages, as caching improves performance.

Cache-Control: private

The next higher directive is “cache-control: private”. It instructs proxies in the path not to cache the page. But it permits browsers to cache the page. Proxies are shared resources used by multiple users, and this directive tells them not to cache the response. Browsers, as we have already noted, may still cache the page.

Cache Control: No-cache

Though this directive sounds like it is instructing the browser not to cache the page, there’s a subtle difference. The “no-cache” directive, according to the RFC, tells the browser that it should revalidate with the server before serving the page from the cache. Revalidation is a neat technique that lets the application conserve band-width. If the page the browser has cached has not changed, the server just signals that to the browser and the page is displayed from the cache.

Hence, the browser (in theory, at least), stores the page in its cache, but displays it only after revalidating with the server. In practice, IE and Firefox have started treating the no-cache directive as if it instructs the browser not to even cache the page. We started observing this behavior about a year ago. We suspect that this change was prompted by the widespread (and incorrect) use of this directive to prevent caching.

Cache Control: No-store

This is the most secure of the cache-control directives. It tells the browser not only not to cache the page, but also not to even store the page in its cache folder. Whenever you’re serving a sensitive page, this is the cache control directive to use.

Notice that of late, “cache-control: no-cache” has also started behaving like the “no-store” directive. To be on the safer side, we recommend that you use both “no-cache” and “no-store” when serving sensitive pages.

Cache Control: max-age

This is the HTTP 1.1 equivalent of the earlier Expires header available in HTTP 1.0. It implicitly tells the browser it may cache the page, but must re-validate with the server if the max-age is exceeded. Setting max-age to zero ensures that a page is never served from cache, but is always re-validated against the server.

Cache Control: must-revalidate

This directive insists that the browser must revalidate the page against the server before serving it from cache. Note that it implicitly lets the browser cache the page. The “no-store” directive is a safer option if you want to prevent a sensitive page from being stored on the browser.

Cache Control: proxy-revalidate

This is similar to the must-revalidate directive, except that this is targeted at proxy servers. It insists that proxy servers must revalidate when serving this request, whereas the user’s browser need not revalidate. This is useful when an authenticated page is being cached in the browser. You don’t want the proxy to cache and serve the page, whereas it’s fine for your browser to cache and serve the page.

References

Discussion is open for this article — there are 2 reader comments. Add yours.