What is HTTP — Cache?
Imagine waiting forever for a website to load — it’s annoying, right? Well, not only is it frustrating for users, but it can also be expensive for website owners. That’s where caching helps to load websites faster and saves both time and money.
In this article, we’ll break down HTTP caching and explore how it improves the web experience
What is Cache?
Caching is the process of storing a copy of a web resource, such as a web page or an image, so that it can be quickly retrieved and served to users when they request it again. This helps reduce the load on web servers and makes websites load faster.
How does HTTP Caching work?
- Web Request: When you visit a website, your browser asks the server for a web page or resource.
- Cache Check: The browser checks if it already has a saved copy of the requested resource.
- Cached Copy: If a cached copy exists and it’s still valid, your browser uses it, avoiding a new download.
- No Cached Copy: If there’s no cached copy or it’s expired, the browser fetches the resource from the server.
- Storing in Cache: After fetching, the browser stores a fresh copy in its cache for future use.
- Reusing Cached Copy: The browser continues to use the cached copy until it expires or gets removed.
Types of cache
Caches can be classified into two main categories, based on where they are stored and who can access them.
Public Cache: Public cache refers to a type of cache that can be stored in a shared location and is accessible by multiple users or clients. It’s like a resource that’s available for anyone to use.
For example, if a cache is stored in a content delivery network (CDN) server, it’s considered as a public cache. This means that the cached data, such as images or stylesheets, can be shared and used by different users who visit the same website.
Private Cache: Private cache is specific to an individual user or client and is not meant to be shared with others. It’s like a personal resource that’s only accessible to a particular user.
For example, if a cache is stored in user browser storage, then it’s considered a private cache. Here, the cached data is only available to the individual user who uses that specific web browser.
HTTP Headers for Caching
HTTP headers are used to control caching behaviour when a web page or resource is requested. Here are some of the basic caching-related headers
1. Cache-Control
The Cache-Control
is a crucial response header used to control caching behaviour. It can contain several attributes, each with a specific purpose.
max-age: The max-age
attribute defines the maximum amount of time a resource can be cached by the browser, in seconds
Cache-Control: max-age=3600
// instructs the browser to cache the resource for one hour.
no-cache: When no-cache
is specified, it means that the resource must be revalidated with the server before use, even if it exists in the cache.
Cache-Control: no-cache
// ensures that the browser always checks with the server for the latest version of the resource.
no-store: The no-store
attribute instructs the browser not to cache the resource at all. It ensures that every request for the resource goes directly to the server.
Cache-Control: no-store
// prevents any caching of the resource
public: When public
is set, it means that the resource can be cached by intermediary caches like CDNs, allowing it to be shared among multiple users.
Cache-Control: public
// allows public caching
private: The private
attribute indicates that the resource should be cached only by the user's browser and should not be stored in intermediary caches.
Cache-Control: private
// ensures that the resource is cached only in browser,remains private to the user.
must-revalidate: If must-revalidate
is specified, it means that the resource must be revalidated with the server after it expires.
Cache-Control: must-revalidate
// ensures that the browser checks with the server for the latest version when the cached resource expires.
2. Expires
The Expires
header is a response header that specifies a future date and time when a cached resource is considered stale and should no longer be used.
Once the expiration time is reached, the cached resource is no longer valid, and the browser will request a fresh copy from the server.
Expires: Thu, 31 Dec 2024 23:59:59 GMT
// cached resource will remain valid until December 31, 2024, at 23:59:59 GMT
If both the
Expires
header and themax-age
directive from theCache-Control
header are present in the response, themax-age
directive takes precedence. This means that the browser will follow themax-age
directive to determine the resource's cache duration
3. Last-Modified and If-Modified-Since
The Last-Modified
and If-Modified-Since
headers are essential to validate the cache and to ensure that browsers use updated versions of resources.
These headers work together to determine whether a cached resource is still valid or needs to be refreshed.
Last-Modified: The Last-Modified
header is included in an HTTP response and indicates the date and time when the resource was last modified on the server.
Last-Modified: Tue, 15 Jan 2024 08:00:00 GMT
// indicates when the resource was last modified on server
If-Modified-Since: The If-Modified-Since
header is included in an HTTP request and contains the date and time (Last-Modified
) of the cached version of a resource that the browser has stored.
If-Modified-Since: Tue, 15 Jan 2024 08:00:00 GMT
// indicates the last-modified time of the cached resource
How do they work together?
- When a browser initially requests a resource, the server sends the resource along with the
Last-Modified
header, indicating its modification timestamp. - The browser caches the resource and remembers the
Last-Modified
timestamp. - On subsequent requests for the same resource, the browser includes an
If-Modified-Since
header, withLast-Modified
timestamp value of the same resource - If the server identifies that the resource has not been modified since the provided timestamp, it responds with a 304 Not Modified status, indicating that the cached version is still valid and hints the browser to re-use it.
- But if the server identifies that the resource is modified since provided timestamp, it sends the updated resource along with a new
Last-Modified
header.
4. ETag and If-None-Match
The ETag
and If-None-Match
headers are also used for cache validation and provide an alternative way to determine whether a cached resource is still valid or needs to be updated.
ETag: The ETag
header is included in an HTTP response and contains a unique identifier (often a hash or a version number) for a specific version of a resource.
ETag: abc123
// unique identifier assigned to the resource
If-None-Match: The If-None-Match
header is included in an HTTP request and contains the ETag
value of the cached version of a resource that the browser has stored.
If-None-Match: abc123
// unique identifier (ETag) of the resource
How do they work together?
- When a browser initially requests a resource, the server sends the resource along with the
ETag
header, which represents the current version's unique identifier. - The browser caches the resource and remembers the
ETag
value. - On subsequent requests for the same resource, the browser includes an
If-None-Match
header, providing the server with theETag
value of the cached version. - If the server identifies that the resource’s current
ETag
matches the one provided in theIf-None-Match
header, it responds with a 304 Not Modified status, indicating that the cached version is still valid and hints the browser to re-use it. - If the server identifies that the resource’s
ETag
has changed from the provided value, it sends the updated resource along with a newETag
header.
Both the
Last-Modified
andETag
headers serve a similar purpose in cache validation. They provide mechanisms for browsers to determine whether a cached resource is still valid or needs to be refreshed.
5. Vary
The Vary
header is an HTTP response header used to specify the criteria or factors that must be taken into account when determining whether a cached response can be used to fulfil a particular request.
To understand this, let’s see an example
Imagine you have a news website, and the URL for a news article is the same for all users.
Based on the Accept-Language
request header, content will be served in different languages
Without Vary
Header
- User A, who prefers reading content in English, visits the article URL with
Accept-Language
header with English, and the server sends the article content in English. - The article content is cached by a proxy/CDN server or the user’s browser.
- Later, User B, who prefers reading content in Spanish, visits the same article URL with
Accept-Language
header with Spanish, indicating a preference for Spanish. - Without a
Vary
header, the caching system treats both requests as the same (as URL is same), and the cached English content is mistakenly used to serve the Spanish preferred user
With Vary
Header
- In this scenario, when the server sends the English content, it includes a
Vary
header withAccept-Language
as value - The caching system, upon seeing the
Vary
header understands that the response can vary depending on theAccept-Language
header. - When the Spanish User B visits the same URL with their
Accept-Language
header indicating Spanish, the caching system knows it needs to check for a specific cache entry that matches theAccept-Language
header value with Spanish - As a result, it avoids serving the cached English content to User B and fetches the Spanish version of the article from the origin server.
Similar to Accept-Language
header, there are several other HTTP headers, such as Accept
, Accept-Encoding
and Authorization
that can be used alongside the Vary
header for cache control.
Vary: Accept-Language, Accept
These are mostly used HTTP headers to control cache
This is all about HTTP Cache, and I have covered only the essential things here. If you want to dive deeper, you can check out this mdn doc about cache