FinCEN

An AI Compliance Exam Has Almost Nothing to Do With Your AI

Published

July 2, 2026

Read Time

mins

Kunal Datta

Chief Product Officer, Unit21

Subscribe to stay informed

Home

Blog

FinCEN

An AI Compliance Exam Has Almost Nothing to Do With Your AI

Table of contents

Text Link

A couple of weeks ago I moderated a conversation with Eric Ellis and Sarah Beth Felix about FinCEN's April 7 NPRM. Eric helped draft an earlier version of the rule while at the OCC and now runs AML at Fifth Third Bank. Sarah Beth is an AML consultant and bank founder, and one of the few voices in this space I'll stop and listen to every time.

‍

I asked them a question I thought would be about technology. A bank walks into an exam and says "we're using AI effectively." What does the examiner ask next?

‍

Neither of them said a word about AI.

‍

They talked about evidence. Can you trace a decision back through the work that produced it. Is your judgment written down anywhere. Is there an artifact that connects your risk assessment to your detection logic to the results you're getting, and does it hold up when someone pulls on it. That's the exam. The model on top is almost beside the point.

‍

I think this is the thing most NPRM commentary gets backwards. Everyone is debating whether to use AI, how much, how soon. An examiner may well ask whether your tool is AI-powered. But notice why they're asking. They're coming at it as a skeptic, from first principles, which is the right instinct. The question underneath the question is whether you're doing everything you reasonably can to catch the most financial crime. AI matters to them only insofar as it helps you do that, and only if you can show it does. So the real question, the one that should drive what you buy and how you run it, is whether your stack can produce that proof on demand. Most can't. That's worth sitting with, because it reframes almost everything else about the rule.

‍

What "show your work" actually means

Eric was an examiner once, and he was blunt about how those conversations went. A banker would say they were using AI. He didn't care about the architecture. He asked: what problem are you solving, what changed, show me the outputs. Then model validation, independent testing, sample reviews. Prove the thing does what you say it does.

‍

You can't prove that with a score. A number that drops out of a black box, with no reasoning attached, tells the examiner nothing except that you trust the box. What survives the exam is the chain: what data got looked at, what showed up, what it concluded, how sure it was, what it recommends, and why. That chain is the evidence. The AI is just one of the hands that produced it, and whether those hands were a model or an analyst or both is not the point. The point is whether the trail is there.

‍

This is why "we're using AI" lands as a non-answer in an exam room. It describes a tool. It says nothing about whether you can defend a single decision the tool made.

‍

The one part of the trail AI can't produce

The sharpest moment of the hour was Sarah Beth's answer to what AI should never do by itself in an AML program: decide whether something is suspicious.

‍

I'd half-expected pushback in the Q&A. There was almost none, because anyone who's done this work knows what she meant. A community or regional bank holds knowledge about its customers that lives in no transaction file. The compliance officer knows the local businesses. She's watched a relationship for years and has a picture of it in three dimensions. She can take an alert, set it against everything she knows, and decide whether it actually crosses the line.

‍

AI has none of that, because AI has never been anywhere. It works off ones and zeros. It's genuinely good at the legwork: surfacing patterns across enormous transaction sets, finding links across a network, assembling evidence that would eat an analyst's afternoon. But the part of the trail that matters most to law enforcement is the why, and the why is human.

‍

She made it concrete with structuring SARs. She's read the AI-written ones and can spot them instantly. They nail the method. The customer moved money in a pattern that trips the threshold, here are the transactions. What they can't supply is why the customer did it, to what end, what else about the relationship matters. Leave that out and you've filed a SAR that satisfies the form and helps no one. So the evidence trail has a section AI simply cannot complete, which is exactly why a human stays on disposition. Not as a courtesy to the analyst. Because the artifact is incomplete without them.

‍

You build the evidence before the exam, or you don't have it

If the exam tests evidence, then the work is accumulating that evidence while you still have time, not the week the examiner schedules a visit.

‍

The rule says FinCEN wants institutions to "responsibly experiment" with AI. Eric pointed out the phrase is carrying enormous weight with almost nothing under it. Sarah Beth gave the most usable definition I've heard: parallel. Run the AI next to your existing process. Compare what each produces. And do the thing everyone is tempted to skip, which is to actually check that the false positives the AI is throwing out are false positives. "It reduces noise" is a vendor's sentence. Make it true on your own data first. Done right, parallel mode isn't just a safety check. It's how you generate the evidence trail you'll later be asked for.

‍

The same logic runs through the metrics. Alert volumes by rule and channel. False positive rates over time. Cycle times. SAR rates against alert volumes. A tuning log with dates and reasons. None of that is decoration for a future exam. It's how a claim about effectiveness becomes a number instead of an adjective, and a number needs a history. The banks in the best shape are the ones who already have the trend.

‍

Your compliance team has to own the detection logic for the same reason. The standard asks you to explain why each rule exists, what risk it covers, how it ties to the risk assessment. When a vendor owns your rules, or changing one means an engineering ticket, the gap is baked in, because the people who can explain a rule aren't the people who wrote it. Capture the rationale when the rule is created. Reconstructing it under pressure is not the same thing, and an examiner can tell the difference.

‍

One last piece, and it's the one teams tend to underbuild. Your analysts have to be able to overrule the AI and have it stick. When an investigator disagrees, they record the disagreement, say why, and that correction goes back into the system. That's governance, and it's also the only way the thing sharpens over time against your bank's risk rather than whatever it was trained on. Every override is another entry in the trail.

‍

So the only vendor question that matters

Every useful question you can ask a vendor turns out to be the same question: can you show your work?

‍

Can the AI explain its reasoning for every decision, in a form my investigators can read and an examiner would accept? Can my compliance team change a rule without an engineering ticket, and does the system capture why when they do? If I have to reconstruct the evidence trail for one alert from 18 months ago, can I, and how? What does shadow mode look like for a new rule, and can I see the impact before it goes live? Where does a human have to sign off, and is that checkpoint enforced or optional?

‍

Notice that none of those questions ask whether the tool is AI-powered. They all ask whether the work is recoverable and defensible. A vendor who can't answer them cleanly is telling you where your audit exposure lives.

‍

Build for the evidence. The AI question answers itself.

Unit21 gives compliance teams traceable decisions, self-service rule management, and the operational metrics that turn a qualitative claim about effectiveness into something an examiner can actually grade.

Get a Demo

‍

First principles

Where I left the webinar: go back to first principles. Continuous review. Sampling to confirm the AI is doing what you think it's doing. Real parallel-testing infrastructure for new rules and new models. A human on final disposition. Use AI to cover more ground than a manual process ever could, not to clear the humans out of the room.

‍

The framework FinCEN is proposing doesn't hand the prize to whoever has the fanciest model. It goes to the institutions whose programs visibly work, where you can trace any decision, point to the outcome, and tell a straight story from risk assessment through detection design to results. The AI is a means to that. The evidence is the thing being graded. Build for the evidence and the AI question mostly answers itself.

‍

The comment period closes June 9 (Docket FINCEN-2026-0034). For more on the NPRM and how to prepare, see our FinCEN AML Program Rule hub.

‍

Kunal Datta is the Chief Product Officer at Unit21 and moderated The Effectiveness Mandate: Making Sense of FinCEN's AML/CFT Program NPRM, a webinar co-hosted by Unit21 and American Banker. He previously wrote about the NPRM's AI provision in FinCEN's AI Provision Is a Signal, Not a Solution.

‍

Kunal Datta

Chief Product Officer, Unit21

Kunal Datta is the Chief Product Officer at Unit21. Prior to Unit21, he led the Product team for Checkout at Fast, and prior to that, led the Product teams responsible for automating aerial wildfire safety inspections at Pacific Gas & Electric.

‍

He has a background leading Product teams using AI to automate processes at regulated entities, as well as financial products, machine learning products, web applications, mobile applications, hardware products, and data products. Kunal is a Fulbright Scholar and studied Civil and Environmental Engineering and Music Science Technology at Stanford University.

‍

Learn more about Unit21

Unit21 is the leader in AI Risk Infrastructure, trusted by over 200 customers across 90 countries, including Sallie Mae, Chime, Intuit, and Green Dot. Our platform unifies fraud and AML with agentic AI that executes investigations end-to-end—gathering evidence, drafting narratives, and filing reports—so teams can scale safely without expanding headcount.