Leaky abstractions are not failures. The lying is.
Every abstraction leaks.
The interesting part is how much damage we do by pretending otherwise.
Fresh Thoughts
View all →Every abstraction leaks.
The interesting part is how much damage we do by pretending otherwise.
For a long time, we treated software engineering as a translation exercise.
When we move from building chatbots to building agents—systems that can call tools and perform actions—we move from the domain of language modeling into the domain of stateful systems.
In a simple chat interface, a model's mistake is usually a semantic one: it gives a wrong answer or hallucinates a fact. But when an agent has the authority to issue a refund, move a file, or trigger a deployment, a mistake can become an operational failure.
Often, these failures are not caused by the model's reasoning, but by how we have designed the system around it.
When you specify a concurrent system, you aren't just describing a state machine. You are describing how that machine is allowed to evolve over time. This creates two distinct correctness categories: forbidden states and required futures.
The first category, Safety, is about ensuring a "bad thing" never happens. The second, Liveness, is about ensuring a "good thing" eventually happens. In TLA+, these aren't just philosophical labels; they change how TLC explores your state space and what its counterexamples actually tell you.
Every few years the industry collectively rediscovers the same uncomfortable fact: you can have good engineers, solid testing, careful code reviews, and still ship a distributed system that behaves in ways nobody expected.
If you have ever chased a bug where a system froze every few hours, only to wake up the moment you attached a debugger, you already know that time is rarely as simple as a counter that increments once a millisecond. Timers fire slightly early or slightly late, interrupts sneak in between instructions, peripherals run from their own oscillators, and distributed nodes see each other through links with variable delay. From the outside everything looks nicely clocked. Inside, every part of the stack negotiates its own idea of now.
Language-model pipelines began life as collections of personal tricks. We shipped features only after someone produced the right paragraph of instructions, and that paragraph lived in a notebook or a mind. The knowledge was fragile, undocumented, and always under revision.
We learned this the hard way building a classifier that stubbornly hovered around 50% accuracy for days. After countless attempts and growing frustration, we discovered two innocuous words in our prompt were poisoning the results. Remove them, and accuracy jumped to 70% overnight. The "fix" lived in one person's head, and we had no process for finding it again.