Show Your Working
On a sales order, the system can check its answer against the handful of customers it could have meant. A number out of a report isn't like that. There's nothing to check it against, only the working behind it, and the working stayed where it ran.
One of the things we're building for clients lets them ask questions of their own operational data in plain English. Someone types it in — how much of a product line went out north last quarter, which suppliers slipped most on lead times, what's in stock against what's already promised — and behind the scenes it writes the query and hands back the number. No report to commission, no waiting on whoever knows the database. It's quick, and most of the time it's right.
The ones that caused trouble were never the answers that came back obviously wrong. Those get spotted, same as a typo in an email. It was the clean ones. A number that looked like any other number — right sort of size, nothing flagged, no reason to stop on it — that was wrong because somewhere in the query a join wasn't complete, or a filter had left out a category nobody thought to mention, or it was the wrong column being selected. The number went into a report, or straight into a decision. If it got caught, it got caught later.
The obvious move is to make the answers better. If they can be wrong without looking wrong, tighten the queries, get the model writing cleaner SQL, and — the thing everyone reaches for — put a number on how sure the system is, so you can hold the shaky ones back and let the solid ones through. We started down that road.
It doesn't lead anywhere on this surface, though, and the clearest way to see why is to look at one where it does. Take the sales order. When a document comes in and the system has to work out which customer it belongs to, it's picking from a list — the actual customers on the account. A set you can count. It scores each one, and the score means something, because it's a score against the others. Over the line, it goes through. Under it, it's held for someone to look at. The whole thing works because there's a finite set of right answers to measure against, and the gate just sits on top of that.
A reporting answer hasn't got a list. The number could have been almost anything. There's no second-best number next to it to rank it against, no shortlist it won out over. So there's nothing to score. Put a confidence figure on it and you've got confidence about nothing — the system isn't choosing between a few candidates, it's building one value in isolation.
Which leaves one thing you can actually look at if you want to know whether to trust it: how it was built. The joins, the filters, the columns it selected, the assumptions sitting inside the view. The working.That's the only part with any structure to get hold of. The number's just where the working came out.
And the working was there. The query ran, and everything that made the number — every join, every filter, the column it landed on — was sitting right there to be had. We just weren't making enough of it. The number came forward on its own and got taken at face value, while the chain behind it stayed back where it ran. So when the moment came to decide whether to trust the figure, the one thing that could have told you wasn't in front of you — not because it didn't exist, but because we hadn't made it the thing the call was made against. A chain you can go back and dig up afterwards isn't a chain anyone's deciding on. Even when we wished we had a rule — send this one, hold that one — there'd have been nothing in front of the rule to read. You can't decide on something that isn't there when the decision's made.
That changes the order things have to happen in. The trail behind an answer isn't a bit of debugging you bolt on later once something's gone wrong. It's the only thing a decision to release could ever sit on. On the sales order, the call and the trail are two separate things — you've got your gate, and you could go back and review the history afterwards if you ever needed to. Here they're the same thing. Until the working comes forward with the number, there's nothing to govern. Just a number, and a person deciding whether to believe it. Which is the exact thing we've been trying to get rid of all along.
Putting the working in front of you doesn't make any of this safe to let run on its own. A trail beside every answer doesn't yet mean the system can start sending reports out unsupervised. It means you can finally ask whether it could — there's something for the question to be about now. There wasn't before. We were trying to grade the answer, and there was nothing to grade it against.
So that's where we've got to. Not "it's handled now." More that we worked out what we'd been doing wrong. You can't check the answer on this one, because there's never anything to check it against. You can only check the working — so it has to be there when the call's made, not waiting behind the answer to be dug up after.
.png)