| Workhorse

Notes from the Coalface

‍

Most of what gets written about AI and automation is written from a distance. Strategy decks. Analyst reports. Case studies polished until the friction has been removed.

This isn't that.

Workhorse is an inventory and order management system for UK product businesses — the kind of operations that run on spreadsheets, supplier relationships, and people who've been doing this long enough to know where the bodies are buried. We're building AI automation into that environment right now, and we're writing down what we find.

The question that started this series was deceptively simple: if automation is supposed to remove labour, why does the labour keep showing up?

Not everywhere. The routine work does disappear. Orders that used to be processed manually get handled without anyone touching them. But the people who used to do that work aren't sitting idle — they're busier, in some cases — doing something slightly different. Checking. Deciding. Catching the things the system flagged, or worse, didn't flag. The headcount problem doesn't solve itself. It moves.

What we've found, piece by piece, is that this isn't a technology problem. It starts with a distinction that sounds technical but turns out to matter more than almost anything else about how operational software is built: a system that records what happened and a system that owns what happens next are different products. Most operational software is only the first one.

From there, the problems compound. Fragmented stacks — a purchasing tool, a forecasting tool, an ERP underneath all of it — move decisions between systems without any of them owning the outcome. Operational systems routinely create executable state without ever declaring whether execution is actually permitted — and the humans who catch the ones that shouldn't go out aren't a safety feature. They're evidence that the system never defined what execution requires. That pattern, it turns out, isn't accidental and it isn't fixable with better tooling. It's structural.

Which is where things get uncomfortable for anyone selling AI as the answer. Better forecasts and higher confidence scores don't reduce the review load — because the review was never about whether the record is correct. It's about whether the system is permitted to act on it. Authority is the bottleneck, and intelligence doesn't move it.

What makes this harder to fix than it looks is that the volume doesn't stay flat. As automation increases, more records arrive at the permission boundary — and the humans absorbing that load aren't checking arithmetic, they're carrying commercial risk the system was never authorised to carry. Accuracy makes it worse, not better: a 95% success rate doesn't produce 95% less work, because the system still can't tell you which 95% are safe. Until it can, you check everything. The pressure compounds precisely because the system is getting smarter.

The automation boundary in an operational workflow isn't placed where someone decided to stop. It forms after an irreversible mistake reaches a customer — a wrong shipment, a duplicated order, a disputed invoice — and it lands at the last point the error was still fixable. After that, every order gets checked. The boundary doesn't move because nothing in the system has changed that would prevent the same event from recurring undetected.

The instinct, when errors keep appearing, is to reduce them— to invest in accuracy until the checking becomes optional. But accuracy isn't what drives the verification load. A team that took their error rate from 5% to 2% found the checking unchanged: the system still couldn't identify which orders it might have got wrong, so every order still got checked. The exit from universal verification isn't a higher score. It's a system that can tell you where it's uncertain.

There's a second problem running underneath the authority question, and in some ways it's harder to see. The review gate assumes records wait safely until someone looks at them — but in most operational systems, an unreviewed order is already participating in execution the moment it exists. Stock availability adjusts, demand signals update, warehouse teams start planning — all against a record no one has approved. By the time are viewer opens it, the downstream decisions are already made.

The gate-is-already-leaking problem assumed that records at least meant what they said — that the ambiguity was about timing, not content.That turns out to be only half of it. In any live operational environment, the sources the AI draws from will always contain records that are individually accurate but collectively contradictory — configuration notes, JIRA tickets, code documentation that each reflect a different moment in a client's history. When an engineer corrects the AI's answer, she's resolving that conflict from knowledge the sources don't contain, and nothing about that resolution gets back to the system. Next time the same query arrives, the same conflicting sources produce the same confident wrong answer.

That recurrence turned out to sit on something more basic than the knowledge loop. A system that logs the fact of an action isn't the same as one that can tell you why it thought the action was right, and without a per-record trail from input to output, every correction is a dead end — it fixes the output but not the cause. Authority, when it eventually arrives, can onlygovern records whose path it can retrace; the trail has to be there first or there's nothing for the rules to act on.

Which makes the usual sequencing worse than it looks. The default plan, almost everywhere, is to define execution authority properly later — after the pilot, after the platform decision, once things settle. But the records don't sit still while you wait: once a confirmation is ingested, the stock ledger updates against it, picks get generated from it, invoices reference it, and authority defined afterwards can only face forwards. By the time the rules are ready, the set they were meant to govern has already moved past their reach.

We're still in the middle of it. The series follows the evidence one step at a time, and the evidence keeps moving. What's below is what we've established so far.

Notes from the Coalface

Ready to TransformYour Customer Management?

Ready to Transform
Your Customer Management?