thepragmaticquant.com

Your AI coding agents can't hear each other — not even across vendors

TL;DR — Coding agents from different vendors — Claude Code, Cursor, Aider — run side by side on one machine and can’t hear each other: when one finishes, or fails, the others have no idea. waitbus is a local async event bus that gives every tool on the box one shared nervous system — when any agent, CI job, test, or container finishes or fails, every other tool hears it the moment it lands, fully offline, with zero cloud and zero account. (It also retires the blind polling loop each tool hand-rolls.) And it’s proven at scale: five real, different LLM agents — Pydantic AI, LangGraph, Claude Code, the Gemini CLI, and a shell control — all reacting to one event on one local bus, with zero invariant failures and roughly 32–51 ms of bus latency.

bash
uvx waitbus demo   # zero-install: watch the whole bus light up in ~30s

Source, install docs, and the stdlib-only supply-chain trail: github.com/astrogilda/waitbus.

A narrated ~80-second walkthrough (audio on): a pytest run wakes a blocked waitbus wait that exits 0; a —timeout 3s wait exits 124; and a five-agent cross-vendor swarm — Pydantic AI and LangGraph in separate processes — all wake from one failure broadcast. Real terminals, real waitbus.

Two coding agents from different vendors are running on the same box — say a Claude Code session and a Cursor window. One of them breaks the build. The other has no idea: it keeps working against a world that just changed under it. That blindness is the thing waitbus removes. It is a local event bus where every tool on the machine shares one nervous system, so when any agent, CI job, test, or container finishes or fails, every other tool — even one from a different vendor — hears it the moment it lands. The blind polling loop each tool hand-rolls falls out as a side effect; the cross-vendor coordination is the point.

Here is the side effect first, because it is the easiest to see. A coding agent just pushed a branch. CI is running. The agent now has nothing useful to do for the next four minutes, but it does not know that, so it loops:

text
$ gh run list --branch fix/parser-bug --limit 1 --json status,conclusion
[{"status":"in_progress","conclusion":null}]
$ sleep 3
$ gh run list --branch fix/parser-bug --limit 1 --json status,conclusion
[{"status":"in_progress","conclusion":null}]
$ sleep 3
...

Eighty iterations later, the run finishes. The agent has read eighty near-identical JSON blobs, paid the API budget for eighty round trips, and burned eighty turns of context on responses that all said “still running.” When the matrix fails, the agent then polls the per-job endpoint to find out which cell broke. More tokens, more turns.

A small Python daemon called waitbus removes that loop — and then does something the polling model structurally cannot. waitbus is a local async nervous system for your machine’s agents: one daemon that listens to GitHub webhooks, pytest events, Docker container events, and filesystem changes, and offers a single blocking waitbus wait primitive on top. An agent says “wake me when X finishes or fails” and goes quiet until it does, or a timeout fires. No polling.

The whole eighty-iteration loop above collapses to one line that blocks and then carries the verdict in its exit code:

bash
# Block until this commit's CI is terminal. Exit 0 = success, 1 = failure, 124 = timeout.
waitbus wait --sha 7f3a1b2 --repo owner/repo --timeout 5m

# Or wait across sources at once: a pytest session finishes AND a container exits.
waitbus wait --all-of 'pytest:fields.event_type="pytest_session"' \
             --all-of 'docker:fields.event_type="docker_container"' --timeout 10m

The process sleeps at idle CPU until the event lands; the exit code is the answer, so a script reads it the way it reads any other command.

Polling re-asks every few seconds and gets nothing new. Subscribing waits once and wakes on the event.

Efficient waiting is the part the platforms are already closing (Claude Code’s own Monitor now wakes a single session without polling). The part they can’t close is the one from the top: because every tool on the box shares the one bus, when one agent’s event lands — including when one agent fails — every other agent finds out at the same moment, even one from a different vendor. Your Claude Code session and your Cursor window are blind to each other by default; on the bus they are not. That is the thing a poller can never do: a poller only sees its own endpoint; it has no idea the peer working alongside it just died against a now-broken world.

Four flavors of the same loop

The agent-polling problem is not one problem. It is four problems with the same shape — gh run watch re-asking the runs API, a test runner tailing JUnit XML, docker ps in a loop past a streaming /events endpoint, a getmtime poll where inotify already exists — each one a different tool’s least-bad recommended pattern, each wasting wall-clock and, on agents, context tokens as every “still running” response gets tokenized and reasoned over before being discarded.

Four sources, four hand-rolled polling loops, one waiting primitive that replaces all of them.

What waitbus is, in one line

One daemon, one event store (SQLite, single file, no server), one broadcast socket, one waitbus wait command. The daemon receives from six built-in sources (GitHub, Alertmanager, pytest, Docker, the filesystem, and agent-emitted events) plus any third-party plugin, normalizes each into a small JSON envelope, and fans it out over a local AF_UNIX socket. Subscribers connect and start receiving events immediately.

bash
uv tool install waitbus && waitbus init && waitbus install-systemd  # or install-launchd on macOS

Less a thing you deploy, more a shared local primitive every tool on the box can read and write.

Why not Redis, NATS, or just a file?

Redis and NATS are servers you run, secure, and keep alive — and they still hand you a bare pub/sub pipe, so you write the GitHub-webhook, pytest, Docker, and filesystem adapters yourself. waitbus is a zero-config AF_UNIX daemon that ships those source adapters built in. A bare file or named pipe is closer in spirit, but it has no event filtering, no replay for a subscriber that was offline, and no way to correlate across sources in one wait. waitbus keeps a single-file SQLite store with a replay cursor, so a consumer that missed an event catches up instead of losing the moment. The cost of all this is one idle daemon that measures around 40 MB RSS — most of it the CPython interpreter baseline.

waitbus is single-machine by design. The trust model rests on a local AF_UNIX socket and a same-user peer check, so the bus is local-only — cross-machine and team coordination are deliberately out of scope for the core. If your agents and CI live on one workstation, that is exactly the shape; if they are spread across hosts, this is not the tool.

Wiring it into your stack

The shape is always the same: name a source, name a predicate over the event’s fields, read the exit code. Every source normalizes into the same JSON envelope, so the same --match dotted.key=json_literal grammar works against all of them — the value is a JSON literal, so "failure" is the string, 12345 the integer, true the bool.

bash
# A pytest session: block until any test in the run reports a failure.
waitbus wait --source pytest --match 'fields.conclusion="failure"' --timeout 5m

# A Docker container exit: a non-zero exit maps to conclusion "failure".
waitbus wait --source docker --match 'fields.conclusion="failure"' --timeout 10m

# A filesystem save: wake when a specific file is finished being written.
waitbus wait --source fs --match 'fields.workflow_name="config.yaml"' --timeout 1h

From Python, the producer and consumer sides are one symmetric public surface. Emitting takes a write-shape event and a stable delivery_id that doubles as the idempotency key — re-emitting the same id is a no-op, not a duplicate:

python
import time
from waitbus import emit
from waitbus._types import EventInsert

emit(EventInsert(
    delivery_id="my-job:42",
    source="agent",
    event_type="agent_message",
    owner="local",
    repo="local",
    received_at=time.time_ns(),   # epoch nanoseconds
    payload_json="{}",
    ingest_method="manual",
))

The consumer side is wait_for (one-shot, blocking) and subscribe (a generator that yields each match) — no polling, no hand-decoding the wire:

python
from waitbus import wait_for, subscribe

frame = wait_for('fields.conclusion="failure"', source="pytest", timeout=300)

for event in subscribe(source="docker"):
    print(event.summary)

Async agents get asubscribe, the same stream as an async for. And the source list is not closed: a third-party package can register its own source by shipping a waitbus.sources.v1 entry-point plugin, discovered at daemon startup and validated against a trust-on-first-use publisher pin.

The proof: a five-agent swarm on one bus

Every design decision above existed to serve one capability I had not actually proven at scale: a whole swarm of real, different agents coordinating on one local bus, and finding out the moment one of them failed. So I ran it.

The test

The bus seeds one event. Then N agents, split across five different driver families, each subscribe, react, and emit their own reaction back onto the same bus. The orchestrator checks two things: did all N agents react, and were all five families represented.

One seed, five heterogeneous agents on one bus, each reacting back. Three LLM providers, two real subscription CLIs, one shell control — nothing mocked.

Both sweeps passed: N=5 and N=10, every agent heard the event, every family represented, zero invariant failures. The bus latency — the time from the seed event to each agent’s process being woken, before it calls any model — was a median of 32.3 ms at N=5 and 50.6 ms at N=10.

One event, five real agents. Emit a seed, watch them wake on the one bus — then kill one and watch the rest find out.

The part that was messy: cost

The latency table was straightforward; the per-agent cost telemetry was not: three providers reported cost in three incompatible ways: OpenAI reported exact token counts and a precise figure (~$0.000007 per reaction); the Claude CLI reported a large notional API-equivalent figure that was lumpy (the first Opus call creates the prompt cache, later calls read it at a discount); the Gemini CLI reported tokens but cost_usd: null; the shell control cost nothing.

The fix was the distinction between metered and notional cost. The budget gate now counts only genuinely-metered spend — the calls that hit an API key and a real invoice. The subscription CLIs run at zero marginal cost; their figures are surfaced for transparency but never gated. When you aggregate cost across a heterogeneous fleet, “what it cost” and “what it would have cost a metered caller” are two different columns, and conflating them is how you abort a run on phantom money.

How one agent’s failure reaches the others

The cross-broadcast proof is the swarm hearing the same event. The hero claim is sharper: when one peer fails, the others find out. That is not a separate system — it is the same bus, used a particular way.

Failure is just an event. A worker that dies emits an event like any other source, and every subscriber already waiting on a predicate finds out at bus latency.

There is no special “failure channel” to wire up. The moment failure is expressible as an event on a shared bus, every agent that cared finds out at bus latency.

Try to break it

The run above is committed in the repo as a baseline — drivers, latencies, invariants, all of it (benchmarks/baselines/waitbus_stress_real_20260607.json).

To watch a swarm coordinate and recover from a failure on your own machine — agents synthesized in-process, no API keys — is one command:

text
uvx waitbus swarm-demo

Kill one of the agents mid-run and watch the others notice. Reproducing the real five-LLM run takes a maintainer checkout — the swarm harness lives outside the installable package:

text
git clone https://github.com/astrogilda/waitbus && cd waitbus
uv sync --extra stress
waitbus stress --real --sweep 5,10

If you find a way to make an agent miss an event it should have caught, I want the bug report.

Frequently asked questions

What is waitbus?
waitbus is a workstation-local async event bus: a small Python daemon that listens to GitHub webhooks, pytest runs, Docker container events, and filesystem changes, and offers a single blocking waitbus wait command. A tool waits on an event instead of polling, and wakes when another tool finishes or fails.
How do AI coding agents from different vendors coordinate with waitbus?
They share one local bus. When any agent, CI job, test, or container finishes or fails, every other tool receives the event in roughly 32 to 51 milliseconds. This includes tools from different vendors, such as Claude Code, Cursor, and Aider.
Does waitbus need the cloud or an account?
No. It runs fully offline on your machine, without cloud services or accounts. It uses only the Python standard library.
How do I try waitbus?
Run uvx waitbus demo for a zero-install, roughly 30-second walkthrough, or uvx waitbus swarm-demo to watch a swarm coordinate and recover from a failure. Source and install docs are at github.com/astrogilda/waitbus.

Next up is how it actually works: the architecture end to end, the MCP wiring, and the decisions behind the build.