Notes from the Coalface
Most of what gets written about AI and automation is written from a distance. Strategy decks. Analyst reports. Case studies polished until the friction has been removed.
This isn't that.
Workhorse is an inventory and order management system for UK product businesses — the kind of operations that run on spreadsheets, supplier relationships, and people who've been doing this long enough to know where the bodies are buried. We're building AI automation into that environment rightnow, and we're writing down what we find.
The question that started this series was deceptively simple: if automation is supposed to remove labour, why does the labour keep showing up?
Not everywhere. The routine work does disappear. Orders that used to be processed manually get handled without anyone touching them. But the people who used to do that work aren't sitting idle — they're busier, in some cases — doing something slightly different. Checking. Deciding. Catching the things the system flagged, or worse, didn't flag. The headcount problem doesn't solve itself. It moves.
What we've found, piece by piece, is that this isn't a technology problem. It starts with a distinction that sounds technical but turns out to matter more than almost anything else about how operational software is built: a system that records what happened and a system that owns what happens next are different products. Most operational software is only the first one.
From there, the problems compound. Fragmented stacks — a purchasing tool, a forecasting tool, an ERP underneath all of it — move decisions between systems without any of them owning the outcome. Operational systems routinely create executable state without ever declaring whether execution is actually permitted — and the humans who catch the ones that shouldn't go out aren't a safety feature. They're evidence that the system never defined what execution requires. That pattern, it turns out, isn't accidental and it isn't fixable with better tooling. It's structural.
Which is where things get uncomfortable for anyone selling AI as the answer. Better forecasts and higher confidence scores don't reduce the review load — because the review was never about whether the record is correct. It's about whether the system is permitted to act on it. Authority is the bottleneck, and intelligence doesn't move it.
What makes this harder to fix than it looks is that the volume doesn't stay flat. As automation increases, more records arrive at the permission boundary — and the humans absorbing that load aren't checking arithmetic, they're carrying commercial risk the system was never authorised to carry. Accuracy makes it worse, not better: a 95% success rate doesn't produce 95% less work, because the system still can't tell you which 95% are safe. Until it can, you check everything. The pressure compounds precisely because the system is getting smarter.
We're still in the middle of it. The series follows the evidence one step at a time, and the evidence keeps moving. What's below is what we've established so far.
.png)