Caching is one of the most important performance optimization techniques used in modern systems — from CPUs to distributed backend systems like Redis, CDN, and browser cache.
This article explains everything about cache in a simple, structured way:
Modern applications rely heavily on caching to achieve:
- Low latency
- High throughput
- Horizontal scalability
- Reduced backend load
This article explains cache in a simple, structured way:
- What is cache
- Types of cache
- Typical Real-World Cache Stack
- Cache Write Strategies
- Cache Eviction strategies (LRU, LFU, MRU, etc.)
- Cache Invalidation Strategy
- Summary
What is Cache?
A cache is a fast temporary storage layer that stores frequently accessed data so future requests can be served much faster.
It is placed between a client (or application) and a slower data source such as a database, disk, or remote service.
Client → Cache → Database
Core Purposes of Caching:
- Reduce latency: By storing data closer to where it’s needed and on faster hardware, caches dramatically reduce the time required to retrieve information.
- Reduce database load: Offloading frequent queries to cache layers prevents database systems from becoming bottlenecks under heavy traffic.
- Improve throughput: Systems can handle more concurrent requests when data is served from fast cache layers rather than slower persistent storage.
- Improve scalability: Caches enable horizontal scaling by distributing the read load across multiple cache nodes.
What Does Caching Really Mean?
A common misconception is that caching exclusively means storing data in RAM. In reality, caching encompasses broader principles:
Caching means:
- Storing data on faster hardware
- Storing data closer to where it is needed
- Storing data in a format optimized for fast access
So caching can involve:
- In-memory storage (Redis, Memcached)
- CPU registers and L1/L2/L3 cache
- SSD-based cache layers
- CDN edge servers
- Browser cache
- OS page cache
Key Idea
Cache is about how fast the data can be accessed.
Types of Cache
A) Hardware Cache (CPU-Level)
Location: Inside the processor
Modern CPUs implement multiple cache levels:
| Level | Location | Speed | Latency | Size |
|---|---|---|---|---|
| L1 | Inside CPU core | Fastest | ~1 ns | 32-64 KB |
| L2 | Near CPU core | Very fast | ~3 ns | 256-512 KB |
| L3 | Shared across cores | Fast | ~12ns | 2-32 MB |
Managed by: CPU hardware
B) Client-Side Cache
Location: User’s device (browser, mobile app)
Examples:
- Browser HTTP cache
- Service Worker cache
- Mobile app cache (NSCache, DiskLruCache)
- LocalStorage / IndexedDB
Used for:
- Static assets (JS, CSS, images)
- API responses
- Offline-first functionality
Benefits:
- Zero network latency
- Reduces server load
- Enables offline access
C) Edge Cache (CDN)
Location: Globally distributed edge locations near users
Examples:
- Cloudflare
- Akamai
- AWS CloudFront
- Fastly
Used for:
- Static assets (images, videos, CSS, JS)
- Public API responses
- Dynamic content with edge computing
Benefits:
- Global low latency (served from nearest edge)
- Offloads origin servers
- DDoS protection
D) Application Server Cache (Local)
Location: Inside each application server’s memory
Application-level caching exists in two distinct forms:
1) In-Memory Application Cache (what you described)
Location: RAM inside the application server
Examples:
- HashMap / Dictionary
- Caffeine
- Guava
- lru-cache (Node.js)
- sync.Map (Go)
Properties:
- Ultra-fast (nanoseconds)
- Volatile (lost on restart)
- Process-local
This is what most people mean when they say:
2) Local Persistent Cache (Disk / SSD based)
Location: SSD or hard disk on the application server
Examples:
- RocksDB
- LevelDB
- SQLite
- DiskLruCache (Android)
- Browser disk cache
Properties:
- Slower than RAM (microseconds to milliseconds)
- Survives restarts
- Larger capacity
- Still local to one server
E) Global Cache (Distributed Cache)
Distributed caches provide shared storage accessible by all application servers. The most popular implementations are Redis, Memcached, and Aerospike.
Examples:
- Redis
- Memcached
- Aerospike
Used for:
- Sessions
- User profiles
- Product catalog
- API responses
Benefits:
- Shared across services
- Horizontally scalable
- High availability
F) Database Cache
Modern database engines implement internal caching mechanisms that operate transparently to applications.
Examples:
- MySQL buffer pool
- PostgreSQL shared buffers
- MongoDB WiredTiger cache
Used for:
- Index pages
- Frequently accessed rows
- Query execution plans
Benefits:
- Automatic
- Transparent to application
Typical Real-World Cache Stack
Cache Write Strategies
Write strategies define how cache and database stay in sync.
1) Write-Through Cache
What it is
In a write-through cache, data is written to both the cache and the primary database simultaneously. A write is considered successful only after both are updated.
App → Cache → Database
This ensures the cache is always consistent with the database.
Key Characteristics
- Strong consistency
- Cache is never stale
- Cache and database updated together
Pros
- Highest data integrity
- Cache always reflects the latest data
- Simple consistency model
Cons (Trade-offs)
- Higher write latency
- Increased load on the database
- Not suitable for write-heavy workloads
When to Use Write-Through
Use write-through when correctness is more important than write performance, and stale data is unacceptable.
Typical use cases:
- Financial transactions (account balances, payments)
- Inventory systems (prevent overselling)
- Authentication & authorization (permissions, tokens)
- Critical configuration data
Real-World Example
An e-commerce checkout system where inventory count must be accurate at all times. A failed or delayed update could result in overselling.
2) Write-Back (Write-Behind) Cache
What it is
In a write-back cache, data is written only to the cache initially.
The database update happens asynchronously, either later or in batches.
App → Cache (dirty)
Cache → Database (later)
During this period, the cache acts as a temporary source of truth.
Key Characteristics
- Cache holds the latest data
- Database may be temporarily stale
- Eventual consistency model
Pros
- Fastest write performance
- High throughput
- Reduced database write load
Cons (Trade-offs)
- Risk of data loss if cache fails before persistence
- More complex failure handling
- Requires background flushing and monitoring
When to Use Write-Back
Use write-back when write performance is critical and some data loss or delay is acceptable.
Typical use cases:
- Real-time gaming leaderboards
- IoT sensor data ingestion
- Social media like/view counters
- Analytics and metrics systems
Real-World Example
A social media like counter where losing a small number of likes is acceptable in exchange for handling millions of writes per second.
3) Write-Around Cache
What it is
In write-around caching, data is written directly to the database, bypassing the cache entirely.
The cache is populated only on subsequent reads.
App → Database (Cache bypassed)
Key Characteristics
- Cache contains only read data
- Writes do not pollute cache
- First read after write is always a cache miss
Pros
- Prevents cache pollution
- Efficient use of limited cache memory
- Simple write path
Cons (Trade-offs)
- Slower first read after write
- Cache does not benefit write-heavy workloads
When to Use Write-Around
Use write-around when written data is unlikely to be read soon.
Typical use cases:
- Large file uploads
- Logging systems
- Streaming ingestion pipelines
- Batch processing outputs
- Archival storage
Real-World Example
Log ingestion systems where data is written continuously but rarely queried in real time.
Summary Comparison
| Strategy | Write Speed | Read Speed | Consistency | Risk |
|---|---|---|---|---|
| Write-Through | Slow | Fast | Strong | Low |
| Write-Back | Fastest | Fast | Eventual | Medium–High |
| Write-Around | Fast | Slow (first read) | DB-consistent | Low |
Cache Eviction Strategies
Cache eviction determines which data is removed from cache when memory is full.
Since cache capacity is limited, eviction policies play a critical role in maintaining high cache hit rates, predictable latency, and system stability.
1) LRU (Least Recently Used)
What it is
LRU evicts the data that has not been accessed for the longest time.
It assumes temporal locality — if data was used recently, it is likely to be used again soon.
Key Characteristics
- Tracks recent access order
- Widely supported and easy to implement
- Default eviction policy in many systems
When to Use LRU
LRU is ideal when workloads exhibit temporal locality.
Best for:
- Web APIs
- User sessions
- Content management systems
- General-purpose application caching
Real-World Example
API gateways caching recently accessed endpoints.
2) LFU (Least Frequently Used)
What it is
LFU evicts the data that has been accessed the fewest number of times over a period of time.
It prioritises frequency over recency.
Key Characteristics
- Tracks access counts
- Protects long-term hot keys
- Handles skewed traffic well
When to Use LFU
LFU is best when traffic follows a power-law distribution( A few things are used a lot ).
Best for:
- Trending products
- Popular videos or posts
- Recommendation systems
- API endpoints with uneven traffic
Real-World Example
Video streaming platforms where a small percentage of content accounts for most views.
3) MRU (Most Recently Used)
What it is
MRU evicts the most recently accessed item.
This is the opposite of LRU.
Key Characteristics
- Assumes recently accessed data will not be reused soon
- Optimised for sequential access
When to Use MRU
MRU is ideal for one-time or sequential access patterns.
Best for:
- Streaming workloads
- Large file scans
- Analytics and ETL jobs
Real-World Example
Batch analytics scanning large datasets once.
4) FIFO (First In First Out)
What it is
FIFO evicts the oldest inserted item, regardless of how often or recently it was accessed.
Key Characteristics
- No access tracking
- Very simple implementation
When to Use FIFO
FIFO is suitable only when:
- Simplicity matters more than performance
- Workloads resemble queues
- Cache is not performance-critical
Real-World Example
Simple buffering systems or queues.
Eviction Strategy Comparison
| Strategy | Evicts | Best For | Risk |
|---|---|---|---|
| LRU | Least recently used | Most applications | Sequential pollution |
| LFU | Least frequently used | Hot-key workloads | New item starvation |
| MRU | Most recently used | Sequential scans | Poor general use |
| FIFO | Oldest entry | Simple queues | Evicts hot data |
Cache Invalidation Strategy
Stale data represents one of the most challenging aspects of caching. When the underlying data source changes but the cache still contains old values, systems must employ various strategies to maintain data freshness.
1: TTL (Time To Live)
Defines how long data remains in cache before automatic expiration
user:123 → TTL = 300 seconds
After 300 seconds → cache entry expires automatically.
Purpose of TTL
- Prevent stale data
- Auto cleanup
- Memory management
- Eventual consistency
TTL is an expiration policy, not a write strategy.
2: Active Invalidation
Explicitly deletes cache keys when underlying data changes. When a database update occurs, the corresponding cache key is immediately removed.
Update DB → Delete cache key
3: Write-Through Update
Updates both cache and database simultaneously, ensuring cache remains current.
4: Event-Driven Sync
Uses message queues like Kafka to propagate database changes to cache systems asynchronously.
DB update → Kafka → Cache update
Caching Fundamentals and Common Pitfalls
1) Cache Warming
Cache warming involves pre-loading data into cache before real users access the system. Instead of waiting for initial user requests to trigger cache misses, you proactively populate the cache with anticipated data.
Example
After deploying new services, warming might involve loading top products, trending posts, homepage data, and configuration values into Redis. This ensures the cache is already “hot” when users arrive.
Why it matters
Cache warming prevents:
- Cold start latency
- Database traffic spikes
- Slow first requests
Real-world applications include warming homepage feeds, recommendation models, and search result sets.
2) Cache Miss
A cache miss occurs when requested data doesn’t exist in cache, forcing the system to fetch from the database, external API, or disk, then store the result in cache for future requests.
Example
User requests:
GET /user/123
Cache lookup fails → DB query → store in cache → return response.
Why it matters
Cache misses directly impact system performance by:
- Increase latency
- Increase DB load
- Reduce throughput
3) Cache Stampede (Thundering Herd)
Cache stampede happens when many requests simultaneously miss the cache and overwhelm the database. This typically occurs when popular cache keys expire, cache is flushed, or servers restart.
Example
Consider a scenario where a popular feed cache expires at noon. If 100,000 users request that feed simultaneously, all experience cache misses and hit the database concurrently, potentially causing database overload and cascading failures.
How to prevent
- Request coalescing (ensuring only one request fetches the data while others wait)
- Stale-while-revalidate
- Lock per key
- Randomized TTL
4) Cache Pollution
Cache pollution happens when the cache is filled with data that is rarely or never reused, causing useful (hot) data to be evicted.
Why it matters
Cache pollution directly impact system performance by:
- Increase Cache miss rate
- Increase latency
- Database/load spikes
- Cache becomes ineffective
Summary Table
| Term | Meaning | Risk |
|---|---|---|
| Cache warming | Preloading cache | Cold start avoided |
| Cache miss | Data not in cache | Higher latency |
| Cache stampede | Many misses at once | DB overload |
| Cache invalidation | Removing stale cache | Stale data bugs |