Thought
Leaky abstractions are not failures. The lying is.
Every abstraction leaks.
The interesting part is how much damage we do by pretending otherwise.
An abstraction compresses detail so a human being can think at the right level most of the time. That is useful. It is also dangerous. The detail you threw away did not disappear. It is still there, waiting for the wrong load, the wrong timeout, the wrong failure mode, or the wrong person to lean on the interface a little too hard.
The leak is not the scandal. Reality is allowed to exist.
The scandal is when we sell an abstraction as if it removed reality instead of hiding it.
People talk about leaky abstractions as if a leak were a defect in an otherwise elegant design. It is not a defect. It is the design meeting reality.
An abstraction leaks because the underlying system still has shape. Time still passes. State still changes. Networks still drop packets. Another process still touches the same row. A user still clicks twice. The machine underneath your nice API remains a machine.
So when somebody says, "This API makes payments simple", the only honest question is: simple under what conditions?
Can it time out after the bank accepted the charge? Can a retry create a duplicate? Can the client observe "failed" when the side effect already happened? Can two workers race on the same payment?
If the answer is yes to any of those, the abstraction has not failed. It has revealed its boundary. The failure begins one step later, when documentation, code review, and architecture discussion keep speaking as if charge() were a neat atomic event. The system ships. The edge case arrives. The postmortem asks why nobody knew.
That is how teams get hurt. Not by complexity itself, but by false simplicity.
Good abstractions do not eliminate the hard parts. They put the hard parts in one place and mark them clearly.
Bad abstractions do the opposite. They smear reality across the system and then force every caller to rediscover the same failure modes the hard way.
You can usually tell which kind you are dealing with by asking one rude question: what does this abstraction require me to know anyway?
If the honest answer is "timeouts, retries, ordering, idempotency keys, transaction boundaries, reconciliation, and whether this event stream is at-least-once or exactly-once-ish-until-it-isn't", then the abstraction is not carrying much weight. It is decorative.
This is the core mistake: confusing interface with semantics.
A narrow interface is not the same thing as a strong guarantee.
sendEmail(user) looks simple. That tells you nothing about whether the email can be sent twice, whether the provider accepted it before your timeout, whether a retry is safe, or whether "sent" means queued, handed off, delivered, or merely written down somewhere optimistic.
The smaller the interface, the easier it is to lie with it.
That is why many "clean" systems rot under pressure. The code reads like certainty while the runtime behaves like negotiation.
TLA+ makes that negotiation impossible to elide — not in the vague "formal methods are rigorous" sense people gesture at, but because it forces you to stop lying by omission.
Take the fake-simple payment call:
charge(payment_id, amount)
At the interface level, that looks like one action. At the system level, it is usually several. A worker decides to issue the charge. The request is sent. The provider may commit the charge. The response may be delayed, lost, or ambiguous. The worker may retry. A reconciliation job may later discover what actually happened.
Once you model that in TLA+, the abstraction loses the right to be fuzzy. You have to say what state exists. You have to say what actions can happen. You have to say what can happen concurrently. You have to say what must never happen, and what must eventually happen.
Now you can write invariants that matter:
- a payment is never marked settled twice
- a committed charge is always associated with the same idempotency key
- reconciliation never invents a charge that no provider execution could have produced
And you can write liveness properties that matter:
- a payment that enters an
Unknownstate is eventually reconciled - a retriable request does not stay in limbo forever
TLA+ does not patch the leak. Better: it tells you exactly where the leak is, what damage it can cause, and which guarantees your design must still preserve when it happens.
That is why formal methods are useful precisely when reality is messy. If the system were actually a clean stack of airtight abstractions, a type signature and a pleasant README would be enough. But that is not the world we build in. We build on queues, databases, clouds, browsers, kernels, providers, and human operators. Each layer compresses the one below it. Each layer leaks under load.
The serious question is not whether the leak exists. The serious question is whether we are precise about it.
An honest abstraction says: here is what I hide, here is what I preserve, here is where reality can still break through, and here are the guarantees that remain true when it does.
That is a much stricter standard than "easy to use", and it shows more respect for the next engineer, for operations, and for the poor bastard debugging the system at 2 a.m.
Leaky abstractions are normal.
Abstractions that leak without telling the truth about the leak are malpractice.