Why Explainability Matters in AI

Not because we’re curious. Because we need to get shit done.

5 min read

15 hours ago

Are explanations important to AI model outputs important?

My first answer to this is: not really.

When an explanation is a rhetorical exercise to impress me you had your reasons for a decision, I’d call that bells and whistles with no impact. If I’m waiting for a cancer diagnosis based on my MRI, I’m much more interested in improving accuracy from 80% to 99% than in seeing a compelling image which shows where the evidence lies. After all, it may take an expert to even recognize the evidence the evidence. Or worse, the evidence might be diffuse, spread across millions of pixels, so no human mind can comprehend it. Chasing explanations just to feel good about trusting the AI is pointless. We should measure correctness, and if the math shows the results exceed human performance, explanations are unnecessary.

But, sometimes an explanation are more than a rhetorical exercise. Here’s when explanations matter:

  1. When accuracy is crucial, and a human can verify the results. Then expalnations can let you bring down the error levels, e.g. from 1% to 0.01%.
  2. When the raw prediction isn’t really all you care about, and the explanation generates useful actions. For example, saying “somewhere in this contract there’s an unfair clause”, isn’t useful as showing exactly where this unfair clause shows up. Hihglighting the unfair clause lets us take action, like propose an edit to the contract.

When the explanation is more important than the answer

Let’s double click on a concrete example from DocuPanda, a service I’ve cofounded. In a nutshell, what we do is let users map complex documents into a JSON payload that contains a consistent, correct output

So maybe we scan an entire rental lease, and emit a short JSON: {“monthlyRentAmount”: 2000, “dogsAllowed” : true}.

To make it very concrete, here’s all 51 pages of my lease from my time in Berkeley, California.

Yeah, rent in Bay Area is insane, thanks for asking

If you’re not from the US, you might be shocked it takes 51 pages to spell out “You’re gonna pay $3700 a month, you get to live here in exchange”. I think it might not be necessary legally, but I digress.

Now, using Docupanda, we can get to bottom line answers like — what’s the rental amount, and can I take my dog to live there, what’s the start date, etc.

Let’s take a look at the JSON we extract

So apparently Roxy can’t come live with me

If you look all the way at the bottom, we have a flag to indicate that pets are disallowed, along with a description of the exception spelled out in the lease.

There are two reasons explainability would be useful here:

  1. Maybe it’s crucial that we get this right. By reviewing the paragraph I can make sure that we understand the policy correctly.
  2. Maybe I want to propose an edit. Just knowing that somewhere in these 51 pages there’s a pet prohibition doesn’t really help — I’ll still have to go over all pages to propose an edit.

So here’s how we solve for this. Rather than just giving you a black box with a dollar amount, a true/false result, etc — we’ve designed DocuPanda to ground its prediction in precise pixels. You can click on a result, and scroll to the exact page and section that justifies our prediction.

Clicking on “pets allowed = false” immediately scrolls to the relevant page where it says “no mammal pets etc”

Explanation-Driven Workflows

At DocuPanda, we’ve observed three overall paradigms for how explainability is used.

Explanations Drive Accuracy

The first paradigm, which we expected from the outset, is that explainability can reduce errors and validate predictions. When you have an invoice for $12,000, you really want a human to ensure the number is valid and not taken out of context, because the stakes are too high if this figure feeds into accounting automation software.

It’s a great property of document processing, that humans are very good at it. We cost a lot, but we know what we’re doing. This puts us in the happy band where humans can verify results very efficiently, often reducing error rates signifcantly.

Explanations drive high-knowledge worker productivity

This paradigm arose naturally from our user base, and we didn’t entirely anticipate it at first. It turns out that Sometimes, more than we want the raw answer to a question, we want to leverage AI to get the right information in front of our eyes. I already hinted at this use case if you consider an output like {“unfair payment terms”: true} — this is hardly useful compared to showing what language of the contract make it unfair.

As a more complete example, consider a bio research company that wants to scour every biological publication to identify processes that increase sugar production in potatoes. They use DocuPanda to answer fields like:

{sugarProductionLowered: true, sugarProductionGenes: [“AP2a”,”TAGL1″]}

Their goal is not to blindly trust DocuPanda and count how many papers mention a gene or something like that. The thing that makes this result useful is that researcher can click around to get right to the gist of the paper. By clicking on the gene names, a researcher can immediately jump in to context where the gene got mentioned — and reason about whether the process described in this paper involving this gene and sugar is relevant to their research goal or not. This is an example where the explanation is more important than the raw answer, and can boost the productivity of very high knowledge workers.

Explanations for liability purposes

There’s another reason to use explanations and leverage them to put a human in the loop. In addition to reducing error rates (often), they let you demonstrate that you have a reasonable, legally compliant process in place.

Regulators care about process. A black box that emits mistakes is not a sound process. The ability to trace every extracted data point back to the original source lets you put a human in the loop to review and approve results. Even if the human doesn’t reduce errors, having that person involved can be legally useful. It shifts the process from being blind automation, for which your company is responsible, to one driven by humans, who have an acceptable rate of clerical errors. A related example is that it looks like regulators and public opinion tolerate a far lower rate of fatal car crashes, measured per-mile, when discussing a fully automated system, vs human driving-assistance tools. I personally find this to be morally unjustifiable, but I don’t make the rules, and we have to play by them.

By giving you the ability to put a human in the loop, you move from a legally tricky minefield of full automation, with the legal exposure it entails, to the more familiar legal territory of a human analyst using a 10x speed and productivity tool (and making occasional mistakes like the rest of us sinners).

all images are owned by the author