State Is the Product: What Agentic Workflows Were Missing

The views expressed here are my own and do not represent those of any current or former employer.

A couple of weeks ago I described fakoli-flow — the orchestration layer that took me out of the role of human message bus and let me describe what I wanted instead of pressing buttons. I wrote that post the day I finally had a workflow I trusted enough to walk away from.

I left something out of that post. Not because it was incomplete. Because I was still building it.

The thing I left out is the part that broke first when I ran fakoli-flow on a real project for the second time.

Here’s what happened. I dispatched a wave of three agents in parallel. Two welder instances on different packages and a herald rewriting the README. Twenty minutes in, welder #1 finished, submitted its status file, and exited. Welder #2 — still mid-flight — reread the original status file from before welder #1’s changes, because nothing told it the world had moved. Herald, meanwhile, was rewriting the README to describe a system that the welders had already modified out from under it. By the time I came back to my terminal, the three agents had produced three internally-consistent worldviews that did not match each other and did not match the codebase.

It was the same parallelism-without-coordination failure I described in the fakoli-crew post. But this time the agents had file ownership rules. They had wave boundaries. They had status files. They had everything fakoli-crew prescribed. And it still broke.

That’s when I stopped and asked the question I should have asked six months earlier: where does the project plan actually live?

The Markdown That Wasn’t a Database

I’d been using markdown for everything. The PRD lived in a docs/plans/ file. The task list lived in another markdown file. The agent status reports — what was claimed, what was in progress, what was done — those lived in a third markdown file that got rewritten on every wave.

Markdown is fine for the things markdown is good at. It’s not good at being a database. It can’t answer “who claimed this task” without a human reading the document. It can’t answer “is this claim still active” without an expiry timestamp. It can’t refuse a write when two agents try to claim the same thing on the same wave, because it has no notion of a transaction.

And the worst failure mode: markdown is silently overwritable. Agent A writes “T012: in progress.” Agent B reads that, decides T012 is taken, writes its own status. A third agent rewrites the status block to reflect a new wave and unintentionally clobbers both. No error. No conflict. Just lost coordination.

Let me be blunt about something. If your “project state” is markdown files in a docs/ directory, you don’t have project state. You have project notes that hopefully reflect what’s true.

What Was Actually Missing

I sat with the failure for a week before I started building. I’m glad I waited, because the obvious move would have been to add more workflow scaffolding — another agent, another hook, another convention — and that would have papered over the real gap.

The real gap was that the workflow had nowhere to write down what was true.

When I made the list of what “what’s true” actually needed to include, it came to five things:

Specs. Not the PRD draft sitting in a markdown file. Parsed requirements, features, and tasks with identifiers that survive renames and edits. Reviewed and approved before any agent can claim them.
Claims. Not assignment-by-comment or “I’ll take this one” in chat. A row with a lease, a heartbeat, a branch, and a list of files the agent expects to touch. Detectable when stale. Releasable on force. Visible to every agent that asks.
Work packets. Not the whole PRD pasted into every agent’s prompt. A compact, task-specific bundle: intent, acceptance criteria, scope, non-goals, dependencies completed, dependencies still blocking, verification commands.
Evidence. Not an agent’s word that the work is done. The commands actually run, the files actually changed, the exit codes, the last lines of output. Captured by hooks during the work, cross-checked against what the agent claims it did.
Scores. Not story points. Six dimensions per task — complexity, parallelizability, context load, blast radius, review risk, agent suitability — so the orchestration layer can decide which work runs in parallel and which agent should take which task.

None of those are workflow primitives. They’re state primitives. They survive sessions, agents, and runtime changes because they live in a database, not a conversation.

What the Existing Alternatives Get Half Right

The market is not empty here. While I was writing the design brief I went through every adjacent tool I could find and asked the same question for each: does this give you the state primitives, or does it give you something that feels like state but actually leans on a markdown convention?

The honest answer for most of them was the second thing.

CCPM — Claude Code Project Management — is the closest serious competitor to where I was heading. It treats PRDs as files, epics as GitHub issues, and parallel agents as worktrees. Strong overlap with what I wanted. But the state is GitHub. Which means the claim model is “assignment to a user,” the work packet is “the issue body, plus comments, which the agent has to summarize itself,” and the scoring is single-axis story points if any. CCPM is great if your project is a GitHub project. It binds you to GitHub Issues as the database, and the database has the shape of an issue, not the shape of an agent work packet.

GitHub Issues used directly as agent state, which is what GitHub Copilot Coding Agent assumes, has the same problem with the lid off. Issues are tracking tickets. They’re not optimized for LLM consumption. The “claim” is an assignment with no expiry. The “evidence” is whatever shows up in the PR. The “context” is the entire comment thread, including the irrelevant parts. It works for one human and one agent. It doesn’t compose to multiple agents working in parallel.

Hamster Studio and the rest of the SaaS planning tools validate the collaboration direction but go the wrong way for the audience I care about. They’re hosted. They want an account. They want your PRD to leave the repository. For most of the developers I know who are actually running Claude Code or Codex on real codebases, “the project plan now lives in someone else’s database” is a non-starter — for data ownership reasons, for offline reasons, for “I want to clone the repo and have everything” reasons.

LangGraph has the right instinct about state being important. It even has durable execution and checkpointing, which is more than most. But LangGraph’s state is workflow execution state — where are we in the graph, what variables are in scope, what comes next. It’s not project planning state — what’s the PRD, what are the tasks, who has claimed what, what evidence supports completion. Same word, different meaning.

Taskmaster, which was the original inspiration for the design brief, gets the planning primitives mostly right — PRD parsing, task graphs, complexity scoring, expand recommendations. But Taskmaster asks “what should my coding agent do next?” That’s a solo question. The question I had was “how do multiple humans and multiple agents safely coordinate around a reviewed project plan?” Taskmaster doesn’t answer that, and it doesn’t try to.

So the strategic shape of the gap was: planning primitives plus collaboration plus claims plus runtime-neutrality, all of it in a local-first state file that the repo can carry around with it. Nobody was building exactly that. Almost everybody was building one or two of those pieces and treating the rest as someone else’s problem.

The Failure That Made the Gap Obvious

The moment I knew this was real was the night I tried to recover from a crashed agent.

I’d dispatched a welder to refactor a module. It claimed the work in a status file, started editing, and then my laptop kernel-panicked. (My laptop has a memory pressure issue that I keep meaning to fix.) When I rebooted, I had: a partially-edited file, a status file claiming the work was in progress, no record of which files the agent had actually touched, no test output, no way to know what was safe to keep and what to revert. The next session I started had no idea any of this had happened.

There’s a name for that kind of failure in distributed systems. It’s called a stale lock. Every database has a way to handle it: leases, heartbeats, force-release with audit. None of those things existed in my markdown-as-state setup. The system had no concept of “this claim is too old to trust.”

So I gave up the markdown approach in one session. I sat down to design what would replace it, and the design ended up being mostly the things every shared-state system gets eventually:

One canonical store with transaction semantics, not free-form text files.
Claims as first-class objects with expiry and heartbeats.
Evidence captured by hooks at the boundary, not narrated by the agent after the fact.
A plan-then-apply gate so “done” requires a review row, not a self-report.

If you’ve worked with Terraform, none of that is novel. It’s roughly the same shape as configuration → state → plan → apply, transplanted to agentic software work. That’s not an accident. It’s the same coordination problem with different actors.

State Is the Product

I want to land on the principle that came out of this, because I think it’s the most important thing I’ve learned about building with AI agents in the last six months.

When people talk about agentic infrastructure they usually mean one of two things. They mean the runtime — the thing that executes the agent, manages tools, handles retries, like the Baara Next engine I wrote about. Or they mean the workflow — the thing that decides which agent runs when, like fakoli-flow. Both of those matter and both of those are work I’ve put my own time into.

But the runtime and the workflow are not enough.

The thing that breaks first, the thing that almost nobody is building, is the shared state layer that exists outside any single chat session, coding agent, or human contributor. The PRD that survives the session reset. The claim that has a heartbeat. The evidence that was captured by a hook, not narrated by an agent. The work packet that gets regenerated on demand from canonical truth.

That layer is the product. The workflow on top of it is downstream. The agent runtime under it is downstream. The state is what makes the rest of the stack composable, because every other piece can read it and every other piece can update it and nobody owns it.

Markdown is fine for notes. It’s not a database. Don’t let your agents pretend it is.

Co-authored with AI, based on the author's working sessions, dictations, and notes.