Specifying the Codex Agent Lifecycle in TLA+

5 minute read Published: 2026-03-25

Codex is built around persistent conversation threads. In this model, an "agent" is not a short-lived worker process. It is a live session that can receive input, run a turn, emit events, and then remain available for later input.

That distinction matters because a turn and a thread are not the same thing.

A thread is the long-lived conversation session.
A turn is one unit of work executed inside that thread after input arrives.
Agent status is the public summary that other parts of the system use to decide whether to wait, send more input, or treat the thread as unavailable.

In this post, I'll try to model the lifecycle of one Codex thread in TLA+. The goal is narrow: specify the public status behavior precisely enough to remove ambiguity about what Completed, Interrupted, Shutdown, and NotFound actually mean.

Table of Contents

1. Status as a Stored Event-Derived Projection

When Codex creates a session, it also creates a status channel initialized to PendingInit. That channel stores the latest public status for the thread. Later, whenever the runtime emits an event that affects lifecycle state, Codex updates the stored status through agent_status_from_event.

The mapping is:

TurnStarted -> Running
TurnComplete(msg) -> Completed(msg)
TurnAborted(Interrupted) -> Interrupted
TurnAborted(Replaced | ReviewEnded) -> Errored("Replaced" | "ReviewEnded")
Error(msg) -> Errored(msg)
ShutdownComplete -> Shutdown

So the public status is not a separate reducer-managed state machine. It is the latest stored projection produced from a subset of emitted events.

The enum mixes two concerns:

Turn Result: The outcome of the most recent interaction (Completed, Interrupted, Errored).
Thread Liveness: Whether the thread is currently executing or available for control (Running, Shutdown).

2. Modeling Restartability

The primary distinction between a turn and a thread is restartability. In most systems, Completed is a terminal state. In Codex, a thread in a Completed or Interrupted state can transition back to Running if new user input arrives.

The TLA+ spec captures this through the ModelRestartable operator:

ModelRestartable(s) == s \in {"pending_init", "completed", "interrupted"}

This operator explains which statuses are restartable in the model. The checked StartTurn action spells the precondition out directly:

StartTurn ==
  /\ status \in {"pending_init", "completed", "interrupted"}
  /\ status' = "running"
  /\ completedMsg' = NoMsg
  /\ errorInfo' = NoError

This reflects the Codex turn-level semantics: Completed and Interrupted describe the result of the last turn, not destruction of the thread. In this model, they are restartable states from which a later input may begin a new turn.

3. The `WaitFinal` Set

Client-side polling logic often requires a definition of "finality" to know when to stop waiting for a result. However, the event-derived nature of the status makes this definition non-trivial. The spec defines WaitFinal to codify the states where a result is guaranteed to be stable:

WaitFinal(s) == s \in {"completed", "errored", "shutdown"}

Note that Interrupted is excluded from WaitFinal. The Codex wait logic treats PendingInit, Running, and Interrupted as non-final. A client that waits only for WaitFinal therefore needs a separate policy for handling Interrupted.

4. Enforcing Payload Discipline

Status variants like Completed and Errored carry associated data (completedMsg and errorInfo). A common failure mode in event-sourced systems is the presence of stale data—where a payload from a previous state persists after a transition.

The TLA+ model uses invariants to enforce payload discipline, ensuring that data is only present when the status warrants it:

CompletedCarriesOnlyCompletionMessage ==
  status = "completed" => completedMsg \in Messages

NonCompletedClearsCompletionMessage ==
  status # "completed" => completedMsg = NoMsg

These invariants force the StartTurn action to explicitly clear payloads. Without this clearing, the model checker (TLC) would find a trace where the system is Running but still holds a message from a prior turn.

5. System Boundaries and Deadlocks

When running the model checker with CHECK_DEADLOCK TRUE, TLC reports a deadlock at the transition from pending_init to shutdown. In TLA+, a deadlock is any state with no valid next moves. Since shutdown is modeled as a terminal state with no outgoing transitions, this deadlock is expected behavior for the model.

This also explains the absence of NotFound from the model's status set. NotFound is not a state of the live thread. It is the result returned by agent control when the thread manager cannot find the thread. Shutdown and NotFound are therefore distinct: direct subscribers to a live thread can observe Shutdown, while later polling through agent control may return NotFound after removal.

6. Verification Results

I checked the model with small bounds:

Messages = {msg1, msg2}
ErrorInfos = {err1, err2}

There are two relevant TLC runs.

First, the configured run with CHECK_DEADLOCK TRUE stops at the expected terminal deadlock in shutdown. That run generates 19 states, finds 10 distinct states, and reaches depth 3.

Second, a deadlock-disabled run explores the full bounded state space and reports:

37 states generated
10 distinct states found
depth 3
no invariant violations

The TLC results verify the core status logic for a single thread:

TypeOK is preserved across all transitions.
Completion and error payloads are cleared precisely when the thread restarts.
Interrupted is correctly excluded from the WaitFinal set.
Shutdown is modeled as the only terminal state.

This 100-line spec replaces an informal documentation block with a verifiable foundation, ensuring the status machine is sound before layering on the complexity of parent/child trees and persistent rollout state.

Appendix: Full TLA+ Spec

---- MODULE AgentLifecycle ----
EXTENDS TLC

\* Compact requirement ledger:
\* - PendingInit -> Running on TurnStarted: StartTurn
\* - Running -> Completed(msg) on TurnComplete: CompleteTurn
\* - Running -> Interrupted on interrupt abort: AbortInterrupted
\* - Running -> Errored on Replaced / ReviewEnded: AbortReplaced, AbortReviewEnded
\* - Any live state -> Errored(err) on generic error: EmitError
\* - Any live state -> Shutdown: Shutdown
\* - Completed and Interrupted may accept another turn: StartTurn precondition
\* - NotFound is not modeled; it is an external lookup result after thread removal
\*
\* Modeling assumptions:
\* - Single live agent thread only.
\* - This models the event-derived status projection, not the full task/session engine.
\* - Completed and Interrupted are restartable because they describe the last turn, not
\*   thread destruction.
\* - Errored is not restartable in this first model, for blog clarity.
\* - Shutdown is the only intended terminal status in the modeled machine.

CONSTANTS Messages, ErrorInfos

Statuses ==
  {
    "pending_init",
    "running",
    "interrupted",
    "completed",
    "errored",
    "shutdown"
  }

AbortReasons == {"replaced", "review_ended"}

NoMsg == "no_msg"
NoError == "no_error"

ErrorPayloads == ErrorInfos \cup AbortReasons

VARIABLES status, completedMsg, errorInfo

Vars == <<status, completedMsg, errorInfo>>

WaitFinal(s) == s \in {"completed", "errored", "shutdown"}
ModelRestartable(s) == s \in {"pending_init", "completed", "interrupted"}
ModelTerminal(s) == s = "shutdown"

TypeOK ==
  /\ status \in Statuses
  /\ completedMsg \in Messages \cup {NoMsg}
  /\ errorInfo \in ErrorPayloads \cup {NoError}

Init ==
  /\ status = "pending_init"
  /\ completedMsg = NoMsg
  /\ errorInfo = NoError

StartTurn ==
  /\ status \in {"pending_init", "completed", "interrupted"}
  /\ status' = "running"
  /\ completedMsg' = NoMsg
  /\ errorInfo' = NoError

CompleteTurn(msg) ==
  /\ status = "running"
  /\ msg \in Messages
  /\ status' = "completed"
  /\ completedMsg' = msg
  /\ UNCHANGED errorInfo

AbortInterrupted ==
  /\ status = "running"
  /\ status' = "interrupted"
  /\ UNCHANGED <<completedMsg, errorInfo>>

AbortReplaced ==
  /\ status = "running"
  /\ status' = "errored"
  /\ errorInfo' = "replaced"
  /\ UNCHANGED completedMsg

AbortReviewEnded ==
  /\ status = "running"
  /\ status' = "errored"
  /\ errorInfo' = "review_ended"
  /\ UNCHANGED completedMsg

EmitError(err) ==
  /\ status # "shutdown"
  /\ err \in ErrorInfos
  /\ status' = "errored"
  /\ completedMsg' = NoMsg
  /\ errorInfo' = err

Shutdown ==
  /\ status # "shutdown"
  /\ status' = "shutdown"
  /\ completedMsg' = NoMsg
  /\ errorInfo' = NoError

Next ==
  \/ StartTurn
  \/ \E msg \in Messages: CompleteTurn(msg)
  \/ AbortInterrupted
  \/ AbortReplaced
  \/ AbortReviewEnded
  \/ \E err \in ErrorInfos: EmitError(err)
  \/ Shutdown

Spec == Init /\ [][Next]_Vars

CompletedCarriesOnlyCompletionMessage ==
  status = "completed" => completedMsg \in Messages

NonCompletedClearsCompletionMessage ==
  status # "completed" => completedMsg = NoMsg

ErroredCarriesOnlyErrorInfo ==
  status = "errored" => errorInfo \in ErrorPayloads

NonErroredClearsErrorInfo ==
  status # "errored" => errorInfo = NoError

InterruptedIsNotWaitFinal ==
  status = "interrupted" => ~WaitFinal(status)

ShutdownIsModelTerminal ==
  status = "shutdown" => ModelTerminal(status)

NonTerminalHasSuccessor ==
  ~ModelTerminal(status) => ENABLED Next

====

Appendix: TLC Config

SPECIFICATION Spec

CONSTANTS
  Messages = {msg1, msg2}
  ErrorInfos = {err1, err2}

INVARIANT
  TypeOK
  CompletedCarriesOnlyCompletionMessage
  NonCompletedClearsCompletionMessage
  ErroredCarriesOnlyErrorInfo
  NonErroredClearsErrorInfo
  InterruptedIsNotWaitFinal
  ShutdownIsModelTerminal

CHECK_DEADLOCK TRUE