The Confidence Chasm

Accuracy has improved, but the decision to send an answer still sits outside the system.

Published on:
27th April 2026

The Confidence Chasm

We've been running a RAG pipeline over our internal documentation for a while now — configuration notes, support history, the kind of accumulated knowledge that takes years to write down and that an engineer would otherwise have to dig through manually. It works. The retrieval is good, the answers are coherent, and the team tells me it saves real time. We've watched the time-to-answer on internal support queries drop by over half, and the answers are good enough that we trust them ourselves when we're triaging.

It also doesn't answer customers directly. An engineer reads what it found, decides whether the answer is right, and sends it on. The pipeline does the digging; the engineer ships the response.

The obvious question is how to close that gap. We can make the answers better, but any improvements from here are incremental. Tune the retrieval. Improve re-ranking. Get the confidence scores higher. Eventually the system gets good enough that the engineer can step out of the loop and the pipeline answers directly. The chasm narrows by accuracy. That's the model.

It’s the wrong model. We've been watching what the engineer actually does between reading the answer and sending it, and it isn't checking whether the answer is correct. By the time the response gets to them, the answer is usually good enough. What they are doing is something the system can't do, and confidence scores have nothing to say about it.

The decision is about account management. About relationships and balancing commercial risk.

The risk isn't symmetric. The system can only answer from what's been written down, and most of what determines whether an answer is safe to send hasn't been. A client's tolerance for a particular kind of work around, the fact that their finance team is mid-migration and any change to reporting will land badly this month, the engineer they actually trust on their side, the small commitment someone made on a call that never made it into a ticket — none of this is in the documentation, and none of it is going to be. The retrieval will produce a confident, technically correct answer drawn from material that exists, and the answer will be wrong in a way the client experiences expensively, because the constraint that made it wrong was never a document. The engineer doesn't know more than the system because they have better recall. They know more because they've been in the room. The system has no equivalent of "been in the room." It has a corpus. It does not have the things that were never written down because everyone present already knew them.

The engineer isn't a quality gate. They are holding the authority to act, because nothing in the system has been authorised to hold it. Improving the answers won't change that. A system that's right 99% of the time still can't tell you which 99%, and on the 1% where the cost of being wrong is borne by the client and the relationship, an engineer who'd otherwise be doing something else is checking.

There's a temptation here to say the answer is to give the system more context — feed it the call notes, the recent ticket history, every single interaction. To an extent, that's worth doing and we are doing it, but it doesn't solve the problem. It just moves where the problem sits. The new system, fed more context, will still produce confident answers from a body of evidence that may have changed yesterday in a way no one wrote down. The thing the engineer carries isn't completeness of context; it's the willingness to be wrong on behalf of the company. That's a different category. You can't bolt it onto a retrieval pipeline by adding more sources.

We're still working through this. What we can say, from inside a working system that does what it's supposed to do and still can't be let off the leash, is that the bridge isn't made of accuracy. The engineer in the middle isn't waiting to be replaced by a better model. They're doing something the system hasn't yet been allowed to.

Ready to Transform
Your Customer Management?

Get started today and see the difference Workhorse can make for your business.