Thought
A cache is a bet against reality
Caching is a structural concession to the speed of light.
Every I/O path has a physical limit. Whether the bottleneck is a disk seek, a network hop, or a cross-regional consensus, reality is always too far away to consult in the hot path. To achieve scale, we stop querying the truth and start managing its ghost.
A cache is a snapshot of state frozen in time. The moment a value is copied from its source of truth, it enters a state of temporal decay. Even without caches, observers can disagree because systems already have delay, buffering, and concurrency. Caching makes that gap explicit by preserving an older answer on purpose. You are no longer reading the source directly; you are reading a trace of what was, betting that the entropy of the world hasn't yet invalidated your copy.
This is the central wager of systems architecture.
Every cache hit is a gamble that the world remains static long enough for the old answer to stay useful. A cache miss is the tax you pay when the system cannot reuse a local copy and must go back to the source.
We formalize this wager through policies that simulate certainty:
- TTL (Time To Live) is an explicit bound on how long you will trust a copy without re-checking.
- Invalidation is a reactive attempt to synchronize the shadow with the substance.
- Write-through/Write-back are choices about when divergence is allowed and how it is later closed.
The complexity of cache invalidation is not just an implementation flaw; it follows from duplicating mutable state. Distribution makes it worse, but the problem begins the moment you keep a copy whose source can change independently. In any system with more than one node, there is no single shared "now". There is only a collection of observers seeing different versions of the truth at different times. A cache intentionally widens this window of disagreement to buy throughput.
When a system fails due to "stale data", the stale read is often not the whole bug. The deeper bug is that the surrounding logic assumed fresher knowledge than the caching scheme could actually provide. The performance was bought by fragmenting reality into locally-held, slightly-wrong snapshots, while the code kept reasoning as if it still had direct access to the source.
We cache to optimize under the physics of our infrastructure. Sometimes that is optional; at scale it often stops being optional. The debt is not absolute correctness, but freshness complexity and coordination cost paid to buy back milliseconds of latency and reduce load on the source. Every cache carries the standing risk that the world changes before your copy does.