Cache : Unlock performance in modern computing

What are caches?

A cache is a hardware or software component that stores data so that future requests for that data can be served faster. Caches are used to reduce latency and improve performance by keeping frequently accessed data closer to the point of use.

Types of caches:

1. CPU Cache:

    • L1 Cache: Located directly on the CPU chip, L1 cache provides the fastest access to frequently used instructions and data.
    • L2 Cache: Slightly larger and slower than L1 cache, L2 cache serves as a secondary cache layer.

    Example Implementation: Intel Smart Cache, AMD Infinity Cache

    2. Disk Cache:

    • Operating System Disk Cache: Utilized by operating systems to store frequently accessed disk data in memory, improving disk read and write performance.
    • File System Caches: File systems such as NTFS and ext4 maintain caches to store file metadata and data blocks.

    Example Implementation: Windows Disk Cache, Linux Disk Cache

    3. Web Cache:

    • Proxy Server Caches: Caches copies of web pages and resources requested by clients, reducing bandwidth usage and improving response times.
    • Content Delivery Network (CDN) Caches: Distributes cached web content across servers globally for faster delivery.

    Example Implementation: Squid, Varnish Cache

    4. Browser Cache:

    • HTTP Caching: Web browsers cache resources like images and scripts locally to speed up page loading.
    • IndexedDB/Web Storage: Modern web browsers support client-side storage mechanisms for caching web application data.

    Example Implementation: Chrome Browser Cache, Firefox Browser Cache

    5. Database Query Cache:

    • MySQL Query Cache: Caches the results of frequently executed SELECT queries to reduce database load.
    • Redis: Utilized as a cache for database query results, improving database performance.

    Example Implementation: MySQL Query Cache, Redis

    6. Object Caching:

    • EHCache: Java-based caching library for caching Java objects in memory.
    • Redis: Offers object caching capabilities with in-memory storage and various data structures.

    Example Implementation: EHCache, Redis

    7. Session Caching:

    • Memcached: Distributed memory caching system commonly used for session caching in web applications.

    Redis: In-memory data store with support for storing session data and high availability features.

    Example Implementation: Memcached, Redis

    8. Application-level Caching:

    • Guava Cache: Java library for caching application-level data with features like automatic loading and eviction policies.
    • ASP.NET Caching: Built-in caching features for caching application data in .NET web applications.

    Example Implementation: Guava Cache, ASP.NET Caching

    9. GPU Cache:

    • NVIDIA GPU Cache: Includes cache memory at various levels to optimize memory access and improve performance in graphics and compute workloads.
    • AMD Radeon GPU Cache: Features cache memory to accelerate memory access in graphics-intensive applications and games.

    Example Implementation: NVIDIA GPU Cache, AMD Radeon GPU Cache

    10. Compiler Cache:

    • ccache: Compiler cache tool that speeds up compilation by caching previously compiled object files.
    • sccache: Another compiler cache tool designed to accelerate compilation by caching intermediate build artifacts.

    Example Implementation: ccache, sccache

    When should cache be used in system design?

    1. Frequent Data Retrieval

    In a web application, if certain database queries or API calls are executed frequently (e.g., fetching user profiles, product information), caching the results can significantly reduce the load on the database or backend services.

    2. Expensive Computations

    In a machine learning application, if a complex model inference process takes a long time to compute, caching the results of previous inferences can avoid redundant computations and speed up response times for similar inputs.

    3. Resource Intensive Operations

    In a content management system, generating dynamic web pages with heavy processing (e.g., rendering complex templates, fetching data from multiple sources) can strain server resources. Caching the rendered HTML content can serve cached pages to subsequent users, reducing server load and improving responsiveness.

    4. Highly Accessed Content

    In a media streaming service, popular videos or images are accessed frequently by users. Caching these static assets at edge locations or CDN servers ensures faster delivery to users, reducing latency and improving overall user experience.

    5. Session Management

    In a web application handling user sessions, caching session data (e.g., user authentication tokens, session variables) in memory can avoid repeated database queries or expensive session management logic, improving application responsiveness and scalability.

    6. API Rate Limiting

    In an API service, implementing caching for rate-limited endpoints can prevent excessive API calls from overwhelming backend systems. Cached responses can be served to clients within the rate limit window, reducing the load on backend servers and ensuring fair resource allocation.

    7. Real-time Analytics

    In an e-commerce platform, tracking user interactions in real-time (e.g., page views, product clicks) generates large volumes of data. Caching aggregated analytics data (e.g., page view counts, popular products) in memory allows for faster querying and reporting, enabling real-time insights for business decision-making.

    8. Content Personalization

    In a recommendation engine, personalized content (e.g., product recommendations, news articles) tailored to individual user preferences requires complex computations. Caching personalized recommendations for each user can deliver a seamless and responsive user experience, even during peak traffic periods.

    Important caching algorithms and Eviction policies

    Caching algorithms are strategies used to determine which items should be kept in the cache and which should be evicted when the cache is full. Here are some common caching algorithms:

    LRU (Least recently used)

    The Least Recently Used (LRU) caching algorithm keeps track of the order in which items are accessed and evicts the least recently used item (ie the item that was used earliest in time that is present in the cache) from the cache when it reaches its capacity. This helps ensuring that the most recently and frequently accessed items remain in the cache for faster retrieval.

    Common Data Structures:

    1. Doubly Linked List: Used to maintain the order of items based on their access time.
    2. Hash Map: Used to store key-value pairs, where the keys correspond to cache keys, and the values point to the nodes in the linked list.

    Examples:

    Here are two examples to illustrate how LRU works:

    Example 1: Basic LRU Cache Implementation

    Suppose we have a cache with a capacity of 3, initially empty. We track the order of access using a queue or a doubly linked list:

    1. Cache state: [ ]
    2. Access item ‘A’: Cache state: [A]
    3. Access item ‘B’: Cache state: [A, B]
    4. Access item ‘C’: Cache state: [A, B, C]
    5. Access item ‘A’ again: Cache state: [B, C, A]
    6. Access item ‘D’: Cache state: [C, A, D] (item ‘B’ is evicted)

    In this example, when the cache reaches its capacity, the least recently used item (‘B’) is evicted to make room for the new item (‘D’).

    Example 2: LRU Cache with different Access Frequency

    Let’s consider a more complex scenario where items have varying access frequencies:

    1. Cache state: [ ]
    2. Access item ‘A’: Cache state: [A]
    3. Access item ‘B’: Cache state: [A, B]
    4. Access item ‘C’: Cache state: [A, B, C]
    5. Access item ‘A’ again: Cache state: [B, C, A]
    6. Access item ‘B’ again: Cache state: [C, A, B]
    7. Access item ‘C’ again: Cache state: [A, B, C]
    8. Access item ‘D’: Cache state: [B, C, D] (item ‘A’ is evicted)

    In this example, the cache keeps track of both the order of access and the frequency of access. Despite item ‘A’ being accessed first, it is evicted because it is the least recently used item after multiple accesses to items ‘B’ and ‘C’.

    LFU (Least frequently used)

    The LFU caching algorithm evicts the least frequently accessed item from the cache when it reaches its capacity. It maintains a count of the number of accesses for each item in the cache and removes the item with the lowest access frequency when eviction is necessary.

    Note that this algorithm has an inherent problem that whenever the eviction needs to be decided between items having same frequencies, you will need to define some sort of tie breaker. In our example, we break the tie using the LRU algorithm as discussed above.

    Common Data Structures:

    1. Hash Map: Used to store key-value pairs, where keys represent cache keys, and values represent the frequency counts or pointers to corresponding cache entries.
    2. Linked List or Heap: Used to maintain items with the same access frequency in the order of their access time. This allows efficient access to the least frequently used items for eviction.

    Examples:

    Let’s take a cache with size 3:

    1. Initial Cache State: [ ]
    2. Access item ‘A’:
      Cache state: {A: (freq=1)}
    3. Access item ‘B’:
      Cache state: {A: (freq=1), B: (freq=1)}
    4. Access item ‘C’:
      Cache state: {A: (freq=1), B: (freq=1), C: (freq=1)}
    5. Access item ‘D’:
      Cache state: {B: (freq=1), C: (freq=1), D: (freq=1)}
      Since the cache is full and all items have the same frequency count, the least recently used item ‘A’ is evicted.
    6. Access item ‘A’ again:
      Cache state: {A: (freq=2), B: (freq=1), C: (freq=1)}
      Item ‘A’ now has a frequency of 2.
    7. Access item ‘E’:
      Cache state: {A: (freq=2), B: (freq=1), E: (freq=1)}
      Since items ‘B’ and ‘C’ have the same frequency count, we evict item ‘B’ because it was accessed first among those with the same frequency count.
    8. Access item ‘B’ again:
      Cache state: {A: (freq=2), B: (freq=2), E: (freq=1)}
      Item ‘B’ now has a frequency of 2.
    9. Access item ‘F’:
      Cache state: {A: (freq=2), B: (freq=2), F: (freq=1)}
      Since items ‘E’ and ‘F’ have the same frequency count, we evict item ‘E’ because it was accessed first among those with the same frequency count.
    10. Access item ‘C’ again:
      Cache state: {A: (freq=2), B: (freq=2), C: (freq=2)}
      Item ‘C’ now has a frequency of 2.
    11. Access item ‘D’ again:
      Cache state: {A: (freq=2), B: (freq=2), D: (freq=1)}
      Item ‘D’ now has a frequency of 1, but no eviction occurs as there’s space in the cache.

    The FIFO (First-In, First-Out) caching algorithm is a simple eviction policy that evicts the oldest item in the cache when the cache reaches its maximum capacity. Let’s describe the FIFO caching algorithm with an example and list the data structures used to implement it.

    FIFO Caching Algorithm:

    When the cache is full and a new item needs to be inserted, the item that was first inserted into the cache is evicted (the oldest item).It follows the principle of “First-In, First-Out,” where the first item to be inserted into the cache is the first to be removed when the cache is full.

    FIFO is easy to implement and understand but may not always provide the best cache performance, especially for access patterns that do not adhere to the order in which items were inserted.

    Common Data Structures

    1. Queue (or Linked List): To maintain the order of insertion and implement the FIFO eviction policy.
    2. Hash Map (optional): To quickly access and update items in the cache for constant time operations.

    Example:

    Cache Size: 3

    1. Access item ‘A’: Cache state: {A}
    2. Access item ‘B’: Cache state: {A, B}
    3. Access item ‘C’: Cache state: {A, B, C}
    4. Access item ‘D’: Cache state: {B, C, D}
      • Item ‘A’ is evicted because it was the first item inserted into the cache.
    5. Access item ‘B’ again: Cache state: {B, C, D} (no change)
      • No eviction occurs as ‘B’ is already in the cache.
    6. Access item ‘E’: Cache state: {C, D, E}
      • Item ‘B’ is evicted because it was the first item inserted into the cache.

    In this example, a queue (or linked list) is used to maintain the order of insertion, and the FIFO eviction policy is applied by removing the oldest item from the queue when the cache is full. A hash map can also be used for efficient item lookup and update operations.

    The LIFO (Last-In, First-Out) caching algorithm evicts the most recently accessed item in the cache when a new item needs to be inserted and the cache is full. Let’s explain the LIFO caching algorithm with an example and a cache size of 2.

    LIFO Caching Algorithm:

    Description:

    When the cache is full and a new item needs to be inserted, the most recently accessed item in the cache is evicted. It follows the principle of “Last-In, First-Out,” where the last item to be inserted into the cache is the first to be removed when the cache is full. LIFO is relatively simple to implement and can be useful in certain scenarios, but it may not always provide optimal cache performance.

    Data Structures Used:

    1. Stack (or Linked List): To maintain the order of insertion and implement the LIFO eviction policy.
    2. Hash Map (optional): To quickly access and update items in the cache for constant time operations.

    Example:

    Cache Size: 2

    1. Access item ‘A’: Cache state: {A}
    2. Access item ‘B’: Cache state: {A, B}
    3. Access item ‘C’: Cache state: {B, C} Item ‘A’ is evicted because it was the most recently accessed item.
    4. Access item ‘D’: Cache state: {C, D}Item ‘B’ is evicted because it was the most recently accessed item.

    In this example, a stack (or linked list) is used to maintain the order of insertion, and the LIFO eviction policy is applied by removing the most recently accessed item from the stack when the cache is full. A hash map can also be used for efficient item lookup and update operations.

    How to choose a cache eviction policy?

    Here is an algorithm to achieve the same:

    Step 1. Assess Access Patterns:

    Analyze the frequency and recency of item accesses. Determine whether access patterns exhibit temporal or spatial locality.

    Step 2: Define Performance Goals:

      Identify key metrics such as cache hit rate, latency, and throughput requirements. Prioritize performance goals based on application needs.

      Step 3: Evaluate System Constraints:

      Consider resource limitations like memory usage and computational overhead. Ensure chosen eviction policy is feasible within system constraints.

      Step 4. Select Eviction Policy:

        Based on access patterns, performance goals, and system constraints, choose the eviction policy that best aligns with the requirements. For example, use LRU for applications with high temporal locality, FIFO for simplicity, or LFU for frequency-based access patterns.

        Step 5. Implement Dynamic Policy Selection:

          Develop a mechanism for dynamically selecting the eviction policy based on real-time monitoring of system performance and access patterns. Use adaptive algorithms to adjust policy selection over time in response to changing workload characteristics.

          Step 6. Fallback Mechanism:

            Include a fallback mechanism to handle unexpected scenarios or edge cases where the chosen eviction policy may not perform optimally. For instance, revert to a default policy or employ a hybrid approach combining multiple policies as needed.

            Step 7. Validate and Iterate:

              Validate the chosen eviction policy through testing and benchmarking against various workloads and scenarios.

              By following this simplified algorithm, you can systematically choose the most suitable cache eviction policy based on input requirements while keeping the process straightforward and easy to implement.

              Examples of the algorithm usage:

              Web Application with Temporal Locality:

                • Access Patterns: Analyze user sessions and page views to identify temporal locality, where recently accessed items are likely to be accessed again.
                • Performance Goals: Prioritize low latency and high cache hit rate to enhance user experience.
                • System Constraints: Consider memory constraints and the need for efficient cache management.
                • Eviction Policy: Choose LRU (Least Recently Used) to leverage temporal locality and prioritize recently accessed items for retention in the cache.

                IoT Data Processing System:

                  • Access Patterns: Monitor sensor data streams with high spatial locality, where nearby data points are frequently accessed together.
                  • Performance Goals: Emphasize throughput and efficient utilization of system resources to handle real-time data processing.
                  • System Constraints: Ensure low latency and scalability to handle large volumes of incoming data.
                  • Eviction Policy: Select FIFO (First-In, First-Out) for simplicity and efficient utilization of cache resources, as data points may not exhibit strong temporal locality.

                  E-commerce Platform:

                    • Access Patterns: Analyze product browsing and purchase history to understand user behavior and preferences.
                    • Performance Goals: Maximize conversion rates and customer satisfaction by delivering personalized recommendations and fast page loads.
                    • System Constraints: Consider the need for dynamic scaling to handle fluctuating traffic and seasonal variations.
                    • Eviction Policy: Opt for a hybrid approach combining LRU and LFU (Least Frequently Used) to balance between recent popularity and long-term relevance of items in the cache.

                    Content Delivery Network (CDN):

                      • Access Patterns: Monitor content requests from users distributed across different geographical regions.
                      • Performance Goals: Minimize content delivery latency and optimize network bandwidth usage.
                      • System Constraints: Consider global scalability and the need for efficient content caching and distribution.
                      • Eviction Policy: Use a combination of LRU and MRU (Most Recently Used) to prioritize recently accessed content while retaining hot and trending items for faster delivery.

                      In each of these examples, we apply the simplified algorithm to analyze access patterns, define performance goals, evaluate system constraints, and select the most appropriate cache eviction policy based on the specific requirements of the application or system.

                      See more in