Cache Architecture Explained — Types, TTL, Write Strategies, Eviction & Real-World Usage

7 min read

Caching is one of the most important performance optimization techniques used in modern systems — from CPUs to distributed backend systems like Redis, CDN, and browser cache.

This article explains everything about cache in a simple, structured way:

Modern applications rely heavily on caching to achieve:

Low latency
High throughput
Horizontal scalability
Reduced backend load

This article explains cache in a simple, structured way:

What is cache
Types of cache
Typical Real-World Cache Stack
Cache Write Strategies
Cache Eviction strategies (LRU, LFU, MRU, etc.)
Cache Invalidation Strategy
Summary

What is Cache?

A cache is a fast temporary storage layer that stores frequently accessed data so future requests can be served much faster.

It is placed between a client (or application) and a slower data source such as a database, disk, or remote service.

Client → Cache → Database

Core Purposes of Caching:

Reduce latency: By storing data closer to where it’s needed and on faster hardware, caches dramatically reduce the time required to retrieve information.
Reduce database load: Offloading frequent queries to cache layers prevents database systems from becoming bottlenecks under heavy traffic.
Improve throughput: Systems can handle more concurrent requests when data is served from fast cache layers rather than slower persistent storage.
Improve scalability: Caches enable horizontal scaling by distributing the read load across multiple cache nodes.

What Does Caching Really Mean?

A common misconception is that caching exclusively means storing data in RAM. In reality, caching encompasses broader principles:

Caching means:

Storing data on faster hardware
Storing data closer to where it is needed
Storing data in a format optimized for fast access

So caching can involve:

In-memory storage (Redis, Memcached)
CPU registers and L1/L2/L3 cache
SSD-based cache layers
CDN edge servers
Browser cache
OS page cache

Key Idea

Cache is not about where data is stored —
Cache is about how fast the data can be accessed.

Types of Cache

A) Hardware Cache (CPU-Level)

Location: Inside the processor

Modern CPUs implement multiple cache levels:

Level	Location	Speed	Latency	Size
L1	Inside CPU core	Fastest	~1 ns	32-64 KB
L2	Near CPU core	Very fast	~3 ns	256-512 KB
L3	Shared across cores	Fast	~12ns	2-32 MB

Managed by: CPU hardware

B) Client-Side Cache

Location: User’s device (browser, mobile app)

Examples:

Browser HTTP cache
Service Worker cache
Mobile app cache (NSCache, DiskLruCache)
LocalStorage / IndexedDB

Used for:

Static assets (JS, CSS, images)
API responses
Offline-first functionality

Benefits:

Zero network latency
Reduces server load
Enables offline access

C) Edge Cache (CDN)

Location: Globally distributed edge locations near users

Examples:

Cloudflare
Akamai
AWS CloudFront
Fastly

Used for:

Static assets (images, videos, CSS, JS)
Public API responses
Dynamic content with edge computing

Benefits:

Global low latency (served from nearest edge)
Offloads origin servers
DDoS protection

D) Application Server Cache (Local)

Location: Inside each application server’s memory

Application-level caching exists in two distinct forms:

1) In-Memory Application Cache (what you described)

Location: RAM inside the application server

Examples:

HashMap / Dictionary
Caffeine
Guava
lru-cache (Node.js)
sync.Map (Go)

Properties:

Ultra-fast (nanoseconds)
Volatile (lost on restart)
Process-local

This is what most people mean when they say:

Application cache

2) Local Persistent Cache (Disk / SSD based)

Location: SSD or hard disk on the application server

Examples:

RocksDB
LevelDB
SQLite
DiskLruCache (Android)
Browser disk cache

Properties:

Slower than RAM (microseconds to milliseconds)
Survives restarts
Larger capacity
Still local to one server

E) Global Cache (Distributed Cache)

Distributed caches provide shared storage accessible by all application servers. The most popular implementations are Redis, Memcached, and Aerospike.

Examples:

Redis
Memcached
Aerospike

Used for:

Sessions
User profiles
Product catalog
API responses

Benefits:

Shared across services
Horizontally scalable
High availability

F) Database Cache

Modern database engines implement internal caching mechanisms that operate transparently to applications.

Examples:

MySQL buffer pool
PostgreSQL shared buffers
MongoDB WiredTiger cache

Used for:

Index pages
Frequently accessed rows
Query execution plans

Benefits:

Automatic
Transparent to application

Typical Real-World Cache Stack

Cache Write Strategies

Write strategies define how cache and database stay in sync.

1) Write-Through Cache

What it is

In a write-through cache, data is written to both the cache and the primary database simultaneously. A write is considered successful only after both are updated.

App → Cache → Database

This ensures the cache is always consistent with the database.

Key Characteristics

Strong consistency
Cache is never stale
Cache and database updated together

Pros

Highest data integrity
Cache always reflects the latest data
Simple consistency model

Cons (Trade-offs)

Higher write latency
Increased load on the database
Not suitable for write-heavy workloads

When to Use Write-Through

Use write-through when correctness is more important than write performance, and stale data is unacceptable.

Typical use cases:

Financial transactions (account balances, payments)
Inventory systems (prevent overselling)
Authentication & authorization (permissions, tokens)
Critical configuration data

Real-World Example

An e-commerce checkout system where inventory count must be accurate at all times. A failed or delayed update could result in overselling.

2) Write-Back (Write-Behind) Cache

What it is

In a write-back cache, data is written only to the cache initially.

The database update happens asynchronously, either later or in batches.

App → Cache (dirty)
Cache → Database (later)

During this period, the cache acts as a temporary source of truth.

Key Characteristics

Cache holds the latest data
Database may be temporarily stale
Eventual consistency model

Pros

Fastest write performance
High throughput
Reduced database write load

Cons (Trade-offs)

Risk of data loss if cache fails before persistence
More complex failure handling
Requires background flushing and monitoring

When to Use Write-Back

Use write-back when write performance is critical and some data loss or delay is acceptable.

Typical use cases:

Real-time gaming leaderboards
IoT sensor data ingestion
Social media like/view counters
Analytics and metrics systems

Real-World Example

A social media like counter where losing a small number of likes is acceptable in exchange for handling millions of writes per second.

3) Write-Around Cache

What it is

In write-around caching, data is written directly to the database, bypassing the cache entirely.

The cache is populated only on subsequent reads.

App → Database (Cache bypassed)

Key Characteristics

Cache contains only read data
Writes do not pollute cache
First read after write is always a cache miss

Pros

Prevents cache pollution
Efficient use of limited cache memory
Simple write path

Cons (Trade-offs)

Slower first read after write
Cache does not benefit write-heavy workloads

When to Use Write-Around

Use write-around when written data is unlikely to be read soon.

Typical use cases:

Large file uploads
Logging systems
Streaming ingestion pipelines
Batch processing outputs
Archival storage

Real-World Example

Log ingestion systems where data is written continuously but rarely queried in real time.

Summary Comparison

Strategy	Write Speed	Read Speed	Consistency	Risk
Write-Through	Slow	Fast	Strong	Low
Write-Back	Fastest	Fast	Eventual	Medium–High
Write-Around	Fast	Slow (first read)	DB-consistent	Low

Cache Eviction Strategies

Cache eviction determines which data is removed from cache when memory is full.

Since cache capacity is limited, eviction policies play a critical role in maintaining high cache hit rates, predictable latency, and system stability.

1) LRU (Least Recently Used)

What it is

LRU evicts the data that has not been accessed for the longest time.

It assumes temporal locality — if data was used recently, it is likely to be used again soon.

Key Characteristics

Tracks recent access order
Widely supported and easy to implement
Default eviction policy in many systems

When to Use LRU

LRU is ideal when workloads exhibit temporal locality.

Best for:

Web APIs
User sessions
Content management systems
General-purpose application caching

Real-World Example

API gateways caching recently accessed endpoints.

2) LFU (Least Frequently Used)

What it is

LFU evicts the data that has been accessed the fewest number of times over a period of time.

It prioritises frequency over recency.

Key Characteristics

Tracks access counts
Protects long-term hot keys
Handles skewed traffic well

When to Use LFU

LFU is best when traffic follows a power-law distribution( A few things are used a lot ).

Best for:

Trending products
Popular videos or posts
Recommendation systems
API endpoints with uneven traffic

Real-World Example

Video streaming platforms where a small percentage of content accounts for most views.

3) MRU (Most Recently Used)

What it is

MRU evicts the most recently accessed item.

This is the opposite of LRU.

Key Characteristics

Assumes recently accessed data will not be reused soon
Optimised for sequential access

When to Use MRU

MRU is ideal for one-time or sequential access patterns.

Best for:

Streaming workloads
Large file scans
Analytics and ETL jobs

Real-World Example

Batch analytics scanning large datasets once.

4) FIFO (First In First Out)

What it is

FIFO evicts the oldest inserted item, regardless of how often or recently it was accessed.

Key Characteristics

No access tracking
Very simple implementation

When to Use FIFO

FIFO is suitable only when:

Simplicity matters more than performance
Workloads resemble queues
Cache is not performance-critical

Real-World Example

Simple buffering systems or queues.

Eviction Strategy Comparison

Strategy	Evicts	Best For	Risk
LRU	Least recently used	Most applications	Sequential pollution
LFU	Least frequently used	Hot-key workloads	New item starvation
MRU	Most recently used	Sequential scans	Poor general use
FIFO	Oldest entry	Simple queues	Evicts hot data

Cache Invalidation Strategy

Stale data represents one of the most challenging aspects of caching. When the underlying data source changes but the cache still contains old values, systems must employ various strategies to maintain data freshness.

1: TTL (Time To Live)

Defines how long data remains in cache before automatic expiration

user:123 → TTL = 300 seconds

After 300 seconds → cache entry expires automatically.

Purpose of TTL

Prevent stale data
Auto cleanup
Memory management
Eventual consistency

TTL is an expiration policy, not a write strategy.

2: Active Invalidation

Explicitly deletes cache keys when underlying data changes. When a database update occurs, the corresponding cache key is immediately removed.

Update DB → Delete cache key

3: Write-Through Update

Updates both cache and database simultaneously, ensuring cache remains current.

4: Event-Driven Sync

Uses message queues like Kafka to propagate database changes to cache systems asynchronously.

DB update → Kafka → Cache update

Caching Fundamentals and Common Pitfalls

1) Cache Warming

Cache warming involves pre-loading data into cache before real users access the system. Instead of waiting for initial user requests to trigger cache misses, you proactively populate the cache with anticipated data.

Example

After deploying new services, warming might involve loading top products, trending posts, homepage data, and configuration values into Redis. This ensures the cache is already “hot” when users arrive.

Why it matters

Cache warming prevents:

Cold start latency
Database traffic spikes
Slow first requests

Real-world applications include warming homepage feeds, recommendation models, and search result sets.

2) Cache Miss

A cache miss occurs when requested data doesn’t exist in cache, forcing the system to fetch from the database, external API, or disk, then store the result in cache for future requests.

Example

User requests:

GET /user/123

Cache lookup fails → DB query → store in cache → return response.

Why it matters

Cache misses directly impact system performance by:

Increase latency
Increase DB load
Reduce throughput

3) Cache Stampede (Thundering Herd)

Cache stampede happens when many requests simultaneously miss the cache and overwhelm the database. This typically occurs when popular cache keys expire, cache is flushed, or servers restart.

Example

Consider a scenario where a popular feed cache expires at noon. If 100,000 users request that feed simultaneously, all experience cache misses and hit the database concurrently, potentially causing database overload and cascading failures.

How to prevent

Request coalescing (ensuring only one request fetches the data while others wait)
Stale-while-revalidate
Lock per key
Randomized TTL

4) Cache Pollution

Cache pollution happens when the cache is filled with data that is rarely or never reused, causing useful (hot) data to be evicted.

Why it matters

Cache pollution directly impact system performance by:

Increase Cache miss rate
Increase latency
Database/load spikes
Cache becomes ineffective

Summary Table

Term	Meaning	Risk
Cache warming	Preloading cache	Cold start avoided
Cache miss	Data not in cache	Higher latency
Cache stampede	Many misses at once	DB overload
Cache invalidation	Removing stale cache	Stale data bugs

⚡ Open This Article in DevScribe

Want to save this?
Open in DevScribe

Save this article directly to DevScribe and work on it offline. Edit, annotate, run code examples, and keep all your developer notes in one place.

Open in DevScribe Download DevScribe

Stay Updated with Us

At Devscribe, we promote a lively space for developers to share insights and experiences. Our blog is not just a source of useful articles, but a gathering place where you can connect with like-minded individuals. Join us as we explore trending topics and collaborate on solutions.
Ready to make your voice heard?

By clicking Join Now, you agree to our Terms and Conditions.