In this article, you will learn:
Is your cache working as it should? Answer this question by using cache hit and miss ratios that can help you determine whether your cache is working successfully. These metrics are often displayed among the statistics of Content Delivery Network (CDN) caches, for example. This article is mainly focused on Amazon CloudFront CDN caches and how to work with them to achieve a better cache hit rate.
Before learning what hit and miss ratios in caches are, it’s good to understand what a cache is.
What is a cache?
A cache is a high-speed memory that temporarily saves data or content from a web page, for example, so that the next time the page is visited, that content is displayed much faster. It helps a web page load much faster for a better user experience. This can be done similarly for databases and other storage.
In AWS Cloud, caching is divided into:
Content Delivery Network (CDN) caching - A CDN is a critical component of nearly any modern web application. It used to be that CDN merely improved the delivery of content by replicating commonly requested files (static content) across a globally distributed set of caching servers.
Web caching - Web caching is performed by retaining HTTP responses and web resources in the cache to fulfill future requests from the cache rather than from the origin servers.
Database caching - In-memory data caching can be one of the most effective strategies to improve your overall application performance and reduce your database costs. Caching can be applied to any type of database including relational databases such as Amazon RDS or NoSQL databases.
These caches are usually provided by these AWS services: Amazon ElastiCache, Amazon DynamoDB Accelerator (DAX), Amazon CloudFront CDN and AWS Greengrass.
In this blog post, you will read about Amazon CloudFront CDN caching.
What is the cache hit ratio?
A cache hit ratio is an important metric that applies to any cache and is not only limited to a CDN. A cache hit describes the situation where your content is successfully served from the cache and not from original storage (origin server).
It’s an important metric for a CDN, but not the only one to monitor; for dynamic websites where content changes frequently, the cache hit ratio will be slightly lower compared to static websites. However, modern CDNs, such as Amazon CloudFront can perform dynamic caching as well. A reputable CDN service provider should provide their cache hit scores in their performance reports.
What is the cache miss ratio?
It’s usually expressed as a percentage, for instance, a 5% cache miss ratio. A cache miss ratio generally refers to when the cache memory is searched, and the data isn’t found. When this happens, a request should be forwarded to the origin storage/server and the content is transferred to the user and if possible, written into the cache.
Cache hit and miss example
Cache hit example
User opens the homepage of your website and for instance, copies of pictures (static content) are loaded from the cache server near to the user, because previous users already used this same content. This is why cache hit rates take time to accumulate. At the start, the cache hit percentage will be 0%. Then it’ll slowly start increasing as the cache servers create a copy of your data. But with a lot of cache servers, that can take a while.
Cache miss example
User opens a product page on an e-commerce website and if a copy of the product picture is not currently in the CDN cache, this request results in a cache miss, and the request is passed along to the origin server for the original picture. The CDN server will cache the photo once the origin server responds, so any other additional requests for it will result in a cache hit.
What is a good CDN cache hit ratio for most websites?
Generally speaking, for most sites, a hit ratio of 95-99%, and a miss ratio of one to five percent is ideal. You should keep in mind that these numbers are very specific to the use case, and for dynamic content or for specific files that can change often, can be very different. You should understand that CDN is used for many different benefits, such as security and cost optimization.
How to calculate the cache hit ratio
The best way to calculate a cache hit ratio is to divide the total number of cache hits by the sum of the total number of cache hits, and the number of cache misses.
This value is usually presented in the percentage of the requests or hits to the applicable cache.
You will find the cache hit ratio formula and the example below.
For example, if you have 43 cache hits (requests) and 11 misses, then that would mean you would divide 43 (total number of cache hits) by 54 (sum of 11 cache misses and 43 cache hits). The result would be a cache hit ratio of 0.796. And to express this as a percentage multiply the end result by 100.
That gives a cache hit ratio of 79.6 %.
How do you measure the CDN cache hit ratio?
You should be able to find cache hit ratios in the statistics of your CDN. If you are not able to find the exact cache hit ratio, you can try to calculate it by using the formula from the previous section.
In the case of Amazon CloudFront CDN, you can get this information in the AWS Management Console in two possible ways:
By using Amazon CloudFront Cache Statistics in the console
Real-time metrics in Amazon CloudWatch
Cache hit and miss problems
Caching applies to a wide variety of use cases but there are a couple of possible questions to answer before using the CDN cache for every content:
Is the data structured well for caching? Simply caching a database record can often be enough to offer significant performance advantages. However, at other times, data is best cached in a format that combines multiple records. Because caches are simple key-value stores, you might also need to cache a data record in multiple different formats, so you can access it by different attributes in the record.
Is the cache effective for every data? Some applications generate access patterns that are not suitable for caching—for example, scanning large data sets that change frequently. In this case, keeping the cache up-to-date may negate any advantages that the cache may provide.
Is it always safe to use cached content (values, objects)? The same piece of data can have different consistency requirements in different contexts. For example, when checking out online, you need the authoritative price of the product, so caching may not be appropriate. However, on product pages, prices may be out of date for a few minutes without negatively affecting users.
Does a high cache hit ratio always mean a CDN is effective?
The cache hit ratio is an important metric for a CDN, but other metrics are also important in CDN effectiveness, such as RTT (round-trip time) or other factors such as where the cached content is stored. Ideally, a CDN service should cache content as close as possible to the end-user and to as many users as possible.
How to increase the cache hit ratio for CDN
Generally, you can improve the CDN cache hit ratio using the following recommendation:
1. Optimise cache-control headers
The Cache-Control header field specifies the instructions for the caching mechanism in the case of request and response. These headers are used to set properties, such as the object’s maximum age, expiration time (TTL), or whether the object is fully cached. Depending on the frequency of content changes, you need to specify this attribute. Optimizing these attribute values can help increase the number of cache hits on the CDN.
Example: Set a time-to-live (TTL) that best fits your content. For instance, if an asset changes approximately every two weeks, a cache time of seven days may be appropriate. However, if the asset is accessed frequently, you may want to use a lifetime of one day or less.
2. Ignore cookies
Cookies tend to be un-cacheable, hence the files that contain them are also un-cacheable. Therefore, it’s important that you set rules. For example, ignore all cookies in requests for assets that you want to be delivered by your CDN.
3. Ignore query strings
Query strings are useful in multiple ways: they help interact with web applications and APIs, aggregate user metrics and provide information for objects. The problem arises when query strings are included in static object URLs. In this case, the CDN mistakes them to be unique objects and will direct the request to the origin server. Accordingly, each request will be classified as a cache miss, even though the requested content was available in the CDN cache. This leads to an unnecessarily lower cache hit ratio.
If you are using Amazon CloudFront CDN, you can follow these AWS recommendations to get a higher cache hit rate. This website describes how to set up and manage the caching of objects to improve performance and meet your business requirements. Some of these recommendations are similar to those described in the previous section, but are more specific for CloudFront:
Specifying how long CloudFront caches your objects
Using CloudFront Origin Shield
Caching based on query string parameters
Caching based on cookie values
Caching based on request headers
Remove Accept-Encoding header when compression is not needed
Serving media content by using HTTP
The StormIT team understands that a well-implemented CDN will optimize your infrastructure costs, effectively distribute resources, and deliver maximum speed with minimum latency. The Amazon CloudFront distribution is built to provide global solutions in streaming, caching, security and website acceleration.
Get in touch with us to learn more.