Unit21 for AML

Why "configurable AI" is the only AML AI compliance teams will actually trust

Published

June 17, 2026

Read Time

mins

Kunal Datta

Chief Product Officer, Unit21

Subscribe to stay informed

Home

Blog

Unit21 for AML

Why "configurable AI" is the only AML AI compliance teams will actually trust

Table of contents

Text Link

Consider a compliance team at the end of an AI demo. The product worked. The numbers looked good. Even so, nobody is ready to say yes. The reason usually goes unspoken, so let me say it plainly. They do not distrust the AI because they think it is wrong. They distrust it because they cannot see inside it, cannot steer it, and would have a hard time explaining it to an examiner on the day it matters. That is not technophobia, it is what doing the job carefully looks like.

‍

It helps to remember what a compliance program is for. Catching financial crime is only half of it. The other half is being able to show, with documentation, that you caught it (or missed it) in a way that is defensible and repeatable. A system that makes good decisions for reasons it cannot show you fails the second half, however accurate it is on average. Said another way: in compliance, "trust me" is not a control.

‍

That is the trust problem with most AML AI on the market. It is also why the only AI a compliance team will adopt, and keep, is the kind they can configure themselves, audit in full, and explain to a skeptic. That has to be designed in from the beginning. You cannot add it the week before an exam.

‍

The standard AML AI pitch, and why it stalls

You have heard the pitch. Deploy our AI, cut your alert volume, free up your analysts. The promise is efficiency. Under the hood is a model the vendor pre-trained on industry data, tuned to industry benchmarks, and shipped as a product you switch on.

‍

The pitch stalls on one question it cannot answer: how do I explain this to my regulator?

‍

Consider the exam. An examiner asks why a particular SAR was filed, or why one was not. "The AI recommended it" will not survive the next question. A real answer traces back to documented logic, the specific risk factors, and the human who reviewed it and put their name on it. A black box gives you none of that. It gives you a recommendation and a confidence score, and a confidence score only tells you how sure the model is. It does not tell you why, which is the entire reason the examiner is there.

‍

Notice why the teams that tried black-box AML tools walked away. Most of them tested fine. What never got resolved was the governance question: how do we own this, explain it, and stand behind it in an exam? Performance was never the blocker. Defensibility was.

‍

What "configurable" actually means

Configurable does not mean a slider from "conservative" to "aggressive." It means your compliance team, not the vendor and not your engineers, decides what the AI investigates, how it goes about it, and what it produces.

‍

In practice, there are four parts.

‍

1. You define what counts as a risk factor for your institution. A bank, a fintech, a gaming platform, and a crypto exchange do not share customers, transaction patterns, or typologies. A signal that means risk at one is noise at another. Generic AI trained on industry-wide data applies generic logic to your specific environment, which is a polite way of saying it applies someone else's logic to your risk. Configurable AI lets you say what you actually mean: here is the data we hold on our customers, here is what we consider suspicious for our business, here is how we want that analysis structured.

‍

2. You control which investigation tasks the agent runs, and in what order. You do not review a watchlist screening hit the way you review a money mule alert. An enhanced due diligence review is a different exercise from a structuring flag. One generic investigation should not run against all of them. Configurable agents let you define a task set for each alert type: which prior activity to pull, which behavioral patterns to check, what online research to run, what fields to include. You turn tasks on and off depending on what the alert requires. This is the part that matters most. The agent is not running an investigation, it is running your investigation, the way you would run it.

‍

3. You set the automation level for each queue. No program has the same confidence in AI across every alert type, and none should pretend to. You might let AI auto-close a category of low-risk false positives with a documented narrative, and require human review for everything else. Configurable platforms let you set this at the queue level: auto-review, where the agent does the legwork before the analyst opens the case; auto-close, where the agent closes clear false positives with its reasoning written down; or human-first, where the agent assists but does not act. You do not switch AI on the way you flip a light switch. You bring it up like a dimmer, one queue at a time, as it earns your trust.

‍

4. You specify the narrative output. The SAR narrative, the disposition note, and the case summary all have to meet your QA standards and your regulator's documentation expectations, and none of that is generic. Configurable AI lets you define what the narrative includes, how it is structured, and what separates a true positive from a false positive in your program. The agent writes to your template, not to a vendor's default.

‍

See how Unit21's configurable AI agents work in practice. Explore Unit21 for AML →

‍

The engineering dependency problem

Before you sign anything with an AI vendor, ask one question: if my team wants the agent to investigate a new kind of alert tomorrow, in the way we would actually work it, how do we get that built?

‍

With most agentic tools there are only two answers, and neither is good. You wait for engineering to script the new behavior, or you wait for the vendor to ship a templated agent that happens to cover your case. The first drops you back into the sprint queue. The second leaves you running someone else's idea of how the investigation should go. Either way, a typology you have spotted but cannot yet teach the agent to work is a hole in your coverage, and "we knew about it but were waiting on the vendor" is not a sentence you want to say in an exam.

‍

Configurable agents give a third answer: you build the task yourself.

‍

Consider what that actually looks like, because it is more concrete than it sounds. You open the task builder and describe, in plain language, what you want the agent to do with a given alert type. You pick a handful of real alerts that represent the data you care about, and you let the agent do the thinking. It explores the flagged entities, their transactions, and their linked instruments. It reads the actual columns on your entities and instruments tables. It checks instrument status to find blocked accounts. It summarizes transaction totals by type. Then it writes the query, and shows you the query, so you are never taking its word for anything.

‍

From there you backtest the task against the alerts you selected, and you read the summaries it produces. If a snapshot comes back empty, it says so rather than inventing a result. If the data it needs is missing, it flags the gap instead of papering over it. When the summaries hold up, you deploy. Start to finish, there was no engineering ticket, and no vendor deciding for you how the investigation should run.

‍

The work does not stop arriving. FinCEN publishes, the national priorities shift, a new alert lands on a Friday afternoon, and your program is expected to handle it by Monday. A typology you cannot teach your agent to investigate is the same hole in your coverage that a missing rule would be. In that environment, the ability to stand up a new agent task yourself, in the same week, is not a luxury. The regulatory cadence requires it.

‍

There is a tell worth listening for when you evaluate a system. When a new typology comes up and you ask a vendor how the agent will handle it, listen to the answer. If it is some version of "do not worry about it, our agent already covers that," you are being handed a template and asked to trust it. The honest answer is "build the task, and backtest it on your own alerts." An agent you cannot teach, on the day your risk changes, is an agent that decides, on your behalf, what it is willing to look for. The old guard of transaction monitoring sold that pre-packaged version, the one-size box you take on faith. That was tolerable when the work changed once a year. It is not tolerable now, and an agent you cannot bend to a new typology in the same week is reason enough to go and find one you can.

‍

Auditability is not optional

Explainability and auditability tend to get treated as things you add once the AI is working. In regulated financial services, they are the price of entry.

‍

Every action an agent takes, every alert it reviews, every case it summarizes, every narrative it drafts, every alert it closes, has to be logged so that a human can review it and a regulator can evaluate it. Not a summary, and not an aggregate score, but a step-by-step record of what the agent did, what data it touched, what it found, and why it reached the conclusion it did. That is what "show your work" means here, and it is the difference between AI you can stand behind and AI that becomes your liability.

‍

At Unit21, every agent action is logged: which tasks ran, what data was analyzed, what sources were consulted, what the output was, and what the human did with it. If an examiner pulls a case from eighteen months ago and asks how the SAR determination was made, the trail is there, timestamped, complete, and tied to the exact agent configuration that was running at the time. That last detail is the one examiners press on, because the question is never only "what did the AI decide." It is "what was the AI you were running back then, and can you prove it."

‍

It is also why your customer data never trains our models. Each case produces a record for your compliance file. It does not become training data for someone else's system.

‍

"All glass box. No black box." Read how Unit21 approaches AI governance →

‍

What it looks like on a single alert

Let us make it concrete. An AML alert fires, a money mule flag. Before the analyst has opened anything, the agent has already run:

‍

A review of the flagged entity's prior alerts, cases, and SARs
A behavioral analysis of its transaction history, looking for structuring and other anomalies
An online search for adverse media, business validation, and other relevant public information
A counterparty analysis surfacing linked entities and connections that do not fit
A narrative draft, written to the format the program specified

‍

When the analyst opens the case, they are not staring at a blank screen. They find a structured summary: here are the red flags the agent found, here is its recommended disposition, here is the evidence with its sources. The analyst reads it, applies judgment, and acts, whether that means approving the recommendation, changing it, or overriding it outright.

‍

What the analyst does not have to do is start from zero, pulling transaction history, hunting adverse media, mapping counterparties, drafting a narrative. That hour is already spent. Their time goes to the judgment instead of the data gathering. This is the line the best AML people insist on, and they are right to: the AI gathers, the human investigates. The agent shortens the time to investigate. It does not get to make the decision.

‍

The task set the agent ran was yours. It is the task you built and backtested earlier, now running on every alert in this queue. The EDD agent runs a different set from the watchlist screening agent. The money mule agent runs a different set from the BEC agent. Each is tuned to what that investigation actually needs. Every step of it sits in the audit log.

‍

The trust that builds over time

There is a pattern to how compliance teams adopt AI when it is done honestly. It starts narrow. One alert type, one queue, auto-review only, with the agent doing the legwork while the human still makes every call. The team runs it in parallel with manual review for a while, compares the outcomes, watches the false positive rate, and checks the AI against their own benchmarks, their own historical decisions and their own best analysts, not generic industry metrics.

‍

A word on measuring this honestly, because it is where good programs tune themselves into a ditch. Do not score the AI only on the alerts that ended in a SAR. An alert that sent a qualified person to look at something genuinely worth looking at did its job, even if the case closed clean after review. If you count only SAR-filed cases as wins, you will undercount what your agents are doing, and you will tune in the wrong direction. The test is simpler than the metric: did this cause the right person to look at the right thing?

‍

As confidence grows, coverage widens. More alert types. Auto-close switched on for the categories where the false positive rate is low and the logic is well understood. Fewer human touches at L1, and more analyst attention on L2 and the SAR decisions where judgment is the whole point. That trust is earned rather than assumed, through measurement and documentation and the ability to course-correct at any moment, because the configuration is yours and you can change it. What sustains configurable AI in a compliance environment is not the demo, but the governance model underneath it.

‍

Why this is a moral question, not a buying decision

Take human trafficking. It does not look the same in any two places. The financial footprint of a trafficking operation in one jurisdiction looks nothing like the one across the border. It shows up one way at a money transmitter and another way at a digital bank, and different again on a marketplace that moves money between strangers, on a card program, on a crypto rail, on a payroll product that someone has bent to their own purpose. The red flags move with the geography, the customer base, even the one feature being abused. There is no single fingerprint.

‍

Now consider what an out-of-the-box agent actually is. It is a template built on someone else's average, the patterns the vendor has seen most often across the customers it happens to serve. Point it at your institution and it will catch the version of the crime that looks like everyone else's. The version that looks like yours, shaped by your products and your customers and your corner of the map, it was never taught to see. It will not tell you what it missed. It hands you a clean queue and a healthy-looking number, and the trafficking that did not match the template sits in the cleared pile, looking handled.

‍

A blind spot you cannot see is not efficiency. It is a blind eye. To let a vendor's template decide, on your behalf, what counts as suspicious is to agree, in advance, not to look for whatever they did not think to look for.

‍

Why does this keep coming back to explainability and control? Consider the incentives honestly for a moment. Catching this activity is not something the market rewards on its own. Every alert is friction. Every investigation is a cost. Every account you close is revenue you turned away, and a great deal of nefarious money is, from a pure growth standpoint, just a paying customer. Left to itself, capitalism does not reliably pay you to find the trafficker or the sanctions evader. At the margin, it often pays you to look the other way and keep growing.

‍

That gap is the whole reason regulation exists. It is why the examiner's job is to be a skeptic, to walk in and ask why this flagged and why that one did not, to assume nothing, and to make you show your work, so that the financial system does not become the plumbing for human trafficking, firearms trafficking, and the rest of the harm that moves through it. The examiner is not there to admire your false positive rate. They are there on behalf of the people the money was used against.

‍

Seen this way, configurability is not a line on a comparison grid. If you cannot define what your institution treats as suspicious, cannot shape the investigation to the crime as it actually appears in your data, and cannot explain to a skeptic why a particular call was made, then you cannot honestly say you are looking for the thing the law asks you to look for. You are trusting a vendor to have looked for you, and that vendor was never the one accountable to the victim. This is a moral question before it is a buying decision, and it is the real reason control matters more than anything else on the page.

‍

The bottom line

If you have been burned by black-box AI, or watched a vendor oversell generic AI and underdeliver on the part where you have to defend it, your skepticism is well earned. The question was never whether AI belongs in AML operations. It plainly does. The question is whether the system gives you the control, the transparency, and the audit trail that a regulated institution requires.

‍

Configurable AI is the version that does. You define what it investigates. You set the automation level. You own the output. You hold the audit trail. When a new typology appears, your compliance team, not engineering and not the vendor, can build the agent task to handle it within the day. That is the version teams come to trust over time, and the version that lets a small team stop skimming and start actually looking, which is the whole reason the rules exist. It also happens to be the only version that survives an exam.

‍

Ready to see how Unit21's configurable AI agents work for your compliance program? Get a demo →

Related resources

‍

Kunal Datta

Chief Product Officer, Unit21

Kunal Datta is the Chief Product Officer at Unit21. Prior to Unit21, he led the Product team for Checkout at Fast, and prior to that, led the Product teams responsible for automating aerial wildfire safety inspections at Pacific Gas & Electric.

‍

He has a background leading Product teams using AI to automate processes at regulated entities, as well as financial products, machine learning products, web applications, mobile applications, hardware products, and data products. Kunal is a Fulbright Scholar and studied Civil and Environmental Engineering and Music Science Technology at Stanford University.

‍

Learn more about Unit21

Unit21 is the leader in AI Risk Infrastructure, trusted by over 200 customers across 90 countries, including Sallie Mae, Chime, Intuit, and Green Dot. Our platform unifies fraud and AML with agentic AI that executes investigations end-to-end—gathering evidence, drafting narratives, and filing reports—so teams can scale safely without expanding headcount.