Cache Architecture Explained — Types, TTL, Write Strategies, Eviction & Real-World Usage

7 min read

Caching is one of the most important performance optimization techniques used in modern systems — from CPUs to distributed backend systems like Redis, CDN, and browser cache.

This article explains everything about cache in a simple, structured way:

Modern applications rely heavily on caching to achieve:

  • Low latency
  • High throughput
  • Horizontal scalability
  • Reduced backend load

This article explains cache in a simple, structured way:

  • What is cache
  • Types of cache
  • Typical Real-World Cache Stack
  • Cache Write Strategies
  • Cache Eviction strategies (LRU, LFU, MRU, etc.)
  • Cache Invalidation Strategy
  • Summary


What is Cache?

A cache is a fast temporary storage layer that stores frequently accessed data so future requests can be served much faster.

It is placed between a client (or application) and a slower data source such as a database, disk, or remote service.

Client → Cache → Database

Core Purposes of Caching:

  • Reduce latency: By storing data closer to where it’s needed and on faster hardware, caches dramatically reduce the time required to retrieve information.
  • Reduce database load: Offloading frequent queries to cache layers prevents database systems from becoming bottlenecks under heavy traffic.
  • Improve throughput: Systems can handle more concurrent requests when data is served from fast cache layers rather than slower persistent storage.
  • Improve scalability: Caches enable horizontal scaling by distributing the read load across multiple cache nodes.


What Does Caching Really Mean?

A common misconception is that caching exclusively means storing data in RAM. In reality, caching encompasses broader principles:

Caching means:

  • Storing data on faster hardware
  • Storing data closer to where it is needed
  • Storing data in a format optimized for fast access

So caching can involve:

  • In-memory storage (Redis, Memcached)
  • CPU registers and L1/L2/L3 cache
  • SSD-based cache layers
  • CDN edge servers
  • Browser cache
  • OS page cache


Key Idea

Cache is not about where data is stored —
Cache is about how fast the data can be accessed.


Types of Cache

A) Hardware Cache (CPU-Level)

Location: Inside the processor

Modern CPUs implement multiple cache levels:

LevelLocationSpeedLatencySize
L1Inside CPU coreFastest~1 ns32-64 KB
L2Near CPU coreVery fast~3 ns256-512 KB
L3Shared across coresFast~12ns2-32 MB

Managed by: CPU hardware


B) Client-Side Cache

Location: User’s device (browser, mobile app)

Examples:

  • Browser HTTP cache
  • Service Worker cache
  • Mobile app cache (NSCache, DiskLruCache)
  • LocalStorage / IndexedDB

Used for:

  • Static assets (JS, CSS, images)
  • API responses
  • Offline-first functionality

Benefits:

  • Zero network latency
  • Reduces server load
  • Enables offline access


C) Edge Cache (CDN)

Location: Globally distributed edge locations near users

Examples:

  • Cloudflare
  • Akamai
  • AWS CloudFront
  • Fastly

Used for:

  • Static assets (images, videos, CSS, JS)
  • Public API responses
  • Dynamic content with edge computing

Benefits:

  • Global low latency (served from nearest edge)
  • Offloads origin servers
  • DDoS protection


D) Application Server Cache (Local)

Location: Inside each application server’s memory

Application-level caching exists in two distinct forms:

1) In-Memory Application Cache (what you described)

Location: RAM inside the application server

Examples:

  • HashMap / Dictionary
  • Caffeine
  • Guava
  • lru-cache (Node.js)
  • sync.Map (Go)

Properties:

  • Ultra-fast (nanoseconds)
  • Volatile (lost on restart)
  • Process-local

This is what most people mean when they say:

Application cache

2) Local Persistent Cache (Disk / SSD based)

Location: SSD or hard disk on the application server

Examples:

  • RocksDB
  • LevelDB
  • SQLite
  • DiskLruCache (Android)
  • Browser disk cache

Properties:

  • Slower than RAM (microseconds to milliseconds)
  • Survives restarts
  • Larger capacity
  • Still local to one server


E) Global Cache (Distributed Cache)

Distributed caches provide shared storage accessible by all application servers. The most popular implementations are Redis, Memcached, and Aerospike.

Examples:

  • Redis
  • Memcached
  • Aerospike

Used for:

  • Sessions
  • User profiles
  • Product catalog
  • API responses

Benefits:

  • Shared across services
  • Horizontally scalable
  • High availability


F) Database Cache

Modern database engines implement internal caching mechanisms that operate transparently to applications.

Examples:

  • MySQL buffer pool
  • PostgreSQL shared buffers
  • MongoDB WiredTiger cache

Used for:

  • Index pages
  • Frequently accessed rows
  • Query execution plans

Benefits:

  • Automatic
  • Transparent to application


 Typical Real-World Cache Stack

BrowserCDN CacheApplication ServerDatabase ServerUsersBrowser CacheApplication CacheDatabase CacheGlobal CacheWeb Page


Cache Write Strategies

Write strategies define how cache and database stay in sync.


1) Write-Through Cache

What it is

In a write-through cache, data is written to  both the cache and the primary database simultaneously. A write is considered successful only after both are updated.

App → Cache → Database

This ensures the cache is always consistent with the database.

Key Characteristics

  • Strong consistency
  • Cache is never stale
  • Cache and database updated together

Pros

  • Highest data integrity
  • Cache always reflects the latest data
  • Simple consistency model

Cons (Trade-offs)

  • Higher write latency
  • Increased load on the database
  • Not suitable for write-heavy workloads

When to Use Write-Through

Use write-through when correctness is more important than write performance, and stale data is unacceptable.

Typical use cases:

  • Financial transactions (account balances, payments)
  • Inventory systems (prevent overselling)
  • Authentication & authorization (permissions, tokens)
  • Critical configuration data

Real-World Example

An e-commerce checkout system where inventory count must be accurate at all times. A failed or delayed update could result in overselling.


2) Write-Back (Write-Behind) Cache

What it is

In a write-back cache, data is written only to the cache initially.

The database update happens asynchronously, either later or in batches.

App → Cache (dirty)
Cache → Database (later)

During this period, the cache acts as a temporary source of truth.

Key Characteristics

  • Cache holds the latest data
  • Database may be temporarily stale
  • Eventual consistency model

Pros

  • Fastest write performance
  • High throughput
  • Reduced database write load

Cons (Trade-offs)

  • Risk of data loss if cache fails before persistence
  • More complex failure handling
  • Requires background flushing and monitoring

When to Use Write-Back

Use write-back when write performance is critical and some data loss or delay is acceptable.

Typical use cases:

  • Real-time gaming leaderboards
  • IoT sensor data ingestion
  • Social media like/view counters
  • Analytics and metrics systems

Real-World Example

A social media like counter where losing a small number of likes is acceptable in exchange for handling millions of writes per second.


3) Write-Around Cache

What it is

In write-around caching, data is written directly to the database, bypassing the cache entirely.

The cache is populated only on subsequent reads.

App → Database (Cache bypassed)

Key Characteristics

  • Cache contains only read data
  • Writes do not pollute cache
  • First read after write is always a cache miss

Pros

  • Prevents cache pollution
  • Efficient use of limited cache memory
  • Simple write path

Cons (Trade-offs)

  • Slower first read after write
  • Cache does not benefit write-heavy workloads

When to Use Write-Around

Use write-around when written data is unlikely to be read soon.

Typical use cases:

  • Large file uploads
  • Logging systems
  • Streaming ingestion pipelines
  • Batch processing outputs
  • Archival storage

Real-World Example

Log ingestion systems where data is written continuously but rarely queried in real time.

Summary Comparison

StrategyWrite SpeedRead SpeedConsistencyRisk
Write-ThroughSlowFastStrongLow
Write-BackFastestFastEventualMedium–High
Write-AroundFastSlow (first read)DB-consistentLow


Cache Eviction Strategies

Cache eviction determines which data is removed from cache when memory is full.

Since cache capacity is limited, eviction policies play a critical role in maintaining high cache hit rates, predictable latency, and system stability.

1) LRU (Least Recently Used)

What it is

LRU evicts the data that has not been accessed for the longest time.

It assumes temporal locality — if data was used recently, it is likely to be used again soon.

Key Characteristics

  • Tracks recent access order
  • Widely supported and easy to implement
  • Default eviction policy in many systems

When to Use LRU

LRU is ideal when workloads exhibit temporal locality.

Best for:

  • Web APIs
  • User sessions
  • Content management systems
  • General-purpose application caching

Real-World Example

API gateways caching recently accessed endpoints.


2) LFU (Least Frequently Used)

What it is

LFU evicts the data that has been accessed the fewest number of times over a period of time.

It prioritises frequency over recency.

Key Characteristics

  • Tracks access counts
  • Protects long-term hot keys
  • Handles skewed traffic well

When to Use LFU

LFU is best when traffic follows a power-law distribution( A few things are used a lot ).

Best for:

  • Trending products
  • Popular videos or posts
  • Recommendation systems
  • API endpoints with uneven traffic

Real-World Example

Video streaming platforms where a small percentage of content accounts for most views.


3) MRU (Most Recently Used)

What it is

MRU evicts the most recently accessed item.

This is the opposite of LRU.

Key Characteristics

  • Assumes recently accessed data will not be reused soon
  • Optimised for sequential access

When to Use MRU

MRU is ideal for one-time or sequential access patterns.

Best for:

  • Streaming workloads
  • Large file scans
  • Analytics and ETL jobs

Real-World Example

Batch analytics scanning large datasets once.


4) FIFO (First In First Out)

What it is

FIFO evicts the oldest inserted item, regardless of how often or recently it was accessed.

Key Characteristics

  • No access tracking
  • Very simple implementation

When to Use FIFO

FIFO is suitable only when:

  • Simplicity matters more than performance
  • Workloads resemble queues
  • Cache is not performance-critical

Real-World Example

Simple buffering systems or queues.


Eviction Strategy Comparison

StrategyEvictsBest ForRisk
LRULeast recently usedMost applicationsSequential pollution
LFULeast frequently usedHot-key workloadsNew item starvation
MRUMost recently usedSequential scansPoor general use
FIFOOldest entrySimple queuesEvicts hot data


Cache Invalidation Strategy

Stale data represents one of the most challenging aspects of caching. When the underlying data source changes but the cache still contains old values, systems must employ various strategies to maintain data freshness.

1: TTL (Time To Live)

Defines how long data remains in cache before automatic expiration

user:123 → TTL = 300 seconds

After 300 seconds → cache entry expires automatically.

Purpose of TTL

  • Prevent stale data
  • Auto cleanup
  • Memory management
  • Eventual consistency

TTL is an expiration policy, not a write strategy.

2: Active Invalidation

Explicitly deletes cache keys when underlying data changes. When a database update occurs, the corresponding cache key is immediately removed.

Update DB → Delete cache key

3: Write-Through Update

Updates both cache and database simultaneously, ensuring cache remains current.

4: Event-Driven Sync

Uses message queues like Kafka to propagate database changes to cache systems asynchronously.

DB update → Kafka → Cache update


Caching Fundamentals and Common Pitfalls

1) Cache Warming

Cache warming involves pre-loading data into cache before real users access the system. Instead of waiting for initial user requests to trigger cache misses, you proactively populate the cache with anticipated data.

Example

After deploying new services, warming might involve loading top products, trending posts, homepage data, and configuration values into Redis. This ensures the cache is already “hot” when users arrive.

Why it matters

Cache warming prevents:

  • Cold start latency
  • Database traffic spikes
  • Slow first requests

Real-world applications include warming homepage feeds, recommendation models, and search result sets.

2) Cache Miss

A cache miss occurs when requested data doesn’t exist in cache, forcing the system to fetch from the database, external API, or disk, then store the result in cache for future requests.

Example

User requests:

GET /user/123

Cache lookup fails → DB query → store in cache → return response.

Why it matters

Cache misses directly impact system performance by:

  • Increase latency
  • Increase DB load
  • Reduce throughput

3) Cache Stampede (Thundering Herd)

Cache stampede happens when many requests simultaneously miss the cache and overwhelm the database. This typically occurs when popular cache keys expire, cache is flushed, or servers restart.

Example

Consider a scenario where a popular feed cache expires at noon. If 100,000 users request that feed simultaneously, all experience cache misses and hit the database concurrently, potentially causing database overload and cascading failures.

How to prevent

  • Request coalescing (ensuring only one request fetches the data while others wait)
  • Stale-while-revalidate
  • Lock per key
  • Randomized TTL

4) Cache Pollution

Cache pollution happens when the cache is filled with data that is rarely or never reused, causing useful (hot) data to be evicted.

Why it matters

Cache pollution directly impact system performance by:

  • Increase Cache miss rate
  • Increase latency
  • Database/load spikes
  • Cache becomes ineffective

Summary Table

TermMeaningRisk
Cache warmingPreloading cacheCold start avoided
Cache missData not in cacheHigher latency
Cache stampedeMany misses at onceDB overload
Cache invalidationRemoving stale cacheStale data bugs

⚡ Open This Article in DevScribe

Want to save this?
Open in DevScribe

Save this article directly to DevScribe and work on it offline. Edit, annotate, run code examples, and keep all your developer notes in one place.

Leave a Reply

Your email address will not be published. Required fields are marked *

Join the Conversation

Stay Updated with Us

At Devscribe, we promote a lively space for developers to share insights and experiences. Our blog is not just a source of useful articles, but a gathering place where you can connect with like-minded individuals. Join us as we explore trending topics and collaborate on solutions.
Ready to make your voice heard?

By clicking Join Now, you agree to our Terms and Conditions.