ML, AI, GenAI, Agentic AI: A Field Guide for Buyers Who Are Done with Buzzwords

Published

May 6, 2026

Read Time

mins

Kunal Datta

Chief Product Officer, Unit21

Subscribe to stay informed

Home

Blog

ML, AI, GenAI, Agentic AI: A Field Guide for Buyers Who Are Done with Buzzwords

Table of contents

Text Link

Every vendor in financial crime prevention claims to be "AI-powered" in 2026. The term has become so diluted that it no longer tells you anything about what the technology actually does.

‍

Legacy platforms running batch processing on 24-hour cycles? AI-powered. Screening tools doing string matching against watchlists? AI-powered. Startups wrapping a ChatGPT layer on top of a rules engine? Definitely AI-powered.

‍

The people paying the price for this confusion are the buyers: compliance leaders, fraud executives, and CISOs trying to make a real technology decision with real regulatory consequences. Analyst firms aren't helping either, lumping vendors with fundamentally different architectures under the same "AI-native" banner, as if a risk scoring model and an autonomous investigation agent are the same technology because they both involve math.

‍

They're not. And the difference matters.

‍

This post is a field guide. Four categories, clear definitions, and concrete examples of what each one looks like in production, not in a pitch deck.

‍

The Four Layers

‍

1. Machine Learning (ML)

‍

Machine learning is the workhorse of traditional fraud detection. It's been around for decades. It's real, it's proven, and it has clear strengths.

‍

An ML model ingests labeled data (thousands of examples of "fraud" and "not fraud") and learns to predict which new transactions look like which. The output is typically a score: a number between 0 and 100 that represents how risky something looks.

‍

ML excels at calibrated scoring. Given enough historical examples, it can weigh hundreds of features simultaneously (transaction amount, time of day, device type, merchant category, account age) and produce a stable, repeatable risk assessment in milliseconds. It's efficient. One model encodes thousands of behavioral signals into a single score.

‍

But ML models struggle when the world changes. A new fraud tactic that doesn't resemble anything in the training data? The model won't catch it until it's seen hundreds of examples, been retrained, validated, and redeployed. That cycle takes weeks to months. In fraud, that's an eternity. ML also struggles with explainability. A model can tell you that something is risky. It often can't tell you why in terms a compliance officer can explain to a regulator. "The model said 87" is not a defensible investigation narrative.

‍

Where you actually see ML in production: real-time risk scoring at the transaction level, pre-decision fraud prevention scores, anomaly detection that flags deviations from established behavioral baselines. These are legitimate, valuable use cases, and they remain valuable.

‍

If a vendor's "AI" is a supervised ML model producing risk scores, that's not a bad product. But calling it "AI" in 2026 is like calling a calculator a computer. Technically accurate. Practically misleading.

‍

2. Classical AI

‍

"AI" as a category is so broad it's almost useless. In the compliance technology market, when vendors say "AI" without further specification, they typically mean one of two things: the ML models described above, or a rules engine with some automated optimization layer on top.

‍

Classical AI in this context often includes rule recommendation systems, threshold optimization algorithms, and pattern-matching engines that can adjust parameters based on historical outcomes. These systems improve over time, but they follow deterministic logic paths. They don't reason. They don't understand language. They don't synthesize unstructured information.

‍

Most "AI-powered compliance platforms" on the market today are operating at this layer. There's nothing wrong with that. Rules engines with intelligent optimization are genuinely useful. But buyers should understand that this is automation, not intelligence. The system is making the existing workflow faster. It is not doing the work.

‍

3. Generative AI (GenAI)

‍

This is where the landscape shifted, and where the confusion got worse.

‍

Generative AI, powered by large language models and transformer architecture, introduced something fundamentally new: the ability to understand and generate human language. For the first time, software could read an investigation narrative, interpret unstructured text from a news article, or draft a SAR filing in a customer's specific format.

‍

The real power of GenAI in compliance isn't chatbots or copilots. It's semantic understanding applied to investigation workflows. A GenAI-powered system can recognize that "Google" and "Alphabet" refer to the same entity, even though there is zero string similarity, because it understands corporate structures, aliases, and contextual relationships. It can evaluate whether a news article about a person with a similar name is actually about your investigation subject, rather than a coincidental match.

‍

This is a fundamentally different capability than anything ML or classical AI can provide. It's not scoring. It's reasoning.

‍

But here's where most vendors get it wrong. The most common deployment of GenAI in compliance is the "copilot": a chat interface layered on top of an existing product that can summarize data or answer questions about an alert. This is GenAI at its most superficial. It surfaces information, but it leaves all judgment and all workflow execution to the human. The system generates a summary. The analyst still has to do the investigation. The investigation still takes 45 minutes. You saved 3 minutes on the summary.

‍

This is the distinction between AI-assisted and AI-driven. AI-assisted tools help users do the job faster. AI-driven systems do the job itself. Most GenAI deployments in compliance today are firmly in the "assisted" camp, and vendors are marketing them as if they're transformational. They're not. They're decorating the existing workflow.

‍

The question buyers should ask: is the GenAI doing a task, or is it just making the existing task slightly more convenient?

‍

4. Agentic AI

‍

This is where the category confusion is worst, and where the stakes are highest.

‍

Agentic AI is not a better chatbot. It's not GenAI with a loop. It's a system that can autonomously plan, execute, and complete multi-step workflows: gathering data, making decisions, producing outputs, and explaining its reasoning, without a human performing each step.

‍

The key word is autonomy. Not full autonomy (humans remain in the loop for final decisions in regulated environments). But operational autonomy, where the system does the work, end to end, and presents a completed investigation for human review.

‍

I think about this through a product management lens. Traditionally, PMs identify the jobs to be done, find the frictions, and solve them. You make the workflow 20% faster. You reduce clicks. You surface the right information at the right time. That's valuable work. But with AI, we now have the ability to play in a different dimension entirely: frequency. Instead of reducing the friction around the job, you reduce the frequency of the job itself. Instead of helping the analyst investigate faster, the system does the investigation. The analyst reviews a completed package.

‍

This is the shift from reducing friction to eliminating frequency. And it's the defining characteristic of real agentic AI.

‍

What does that look like in compliance? An agentic system receives an alert. It pulls transaction history. It screens against watchlists. It runs open-source research. It checks for prior investigations on related entities. It evaluates the combined evidence. It drafts a regulator-ready narrative. It recommends a disposition (close as false positive, escalate for human review, or file a SAR) with evidence-based rationale tied to specific findings. The analyst reviews a completed investigation package. Not a summary. Not a recommendation with no supporting work. A completed investigation with full reasoning, evidence citations, and a transparent work log of every step taken.

‍

Three things separate real agentic AI from everything else.

‍

First, multi-step execution. The system doesn't do one thing. It executes a workflow. Each step informs the next. The path through the investigation varies based on what the evidence shows, just like it would for a human analyst.

‍

Second, within-task reasoning. Each individual step contains its own intelligence layer. A sanctions screening task doesn't just do string matching. It semantically evaluates whether the match is real (recognizing that "Mohammed al-Rahman" and "M. Alrahman" are the same person). An online search task doesn't just return results. It assesses relevance and credibility. This within-task intelligence is what separates real agentic AI from a scripted automation that calls an API and returns a result.

‍

Third, orchestration. Above the individual tasks sits a reasoning layer that synthesizes everything: weighing conflicting signals, resolving ambiguity, and arriving at a judgment. This layer doesn't follow a deterministic path. It reasons across the full body of evidence, the way a senior analyst would synthesize a complex investigation.

‍

Most vendors claiming "agentic AI" in 2026 are running a GenAI model in a loop with some tool-calling capability. That's a workflow automation with an LLM in the middle. The harness around the model is multi-layered, not a single wrapper. Real agentic AI has multiple layers of reasoning (within each task, across tasks, and at the orchestration level) with configurable autonomy, full auditability, and the ability to test against historical data before deployment.

‍

What Most Vendors Won't Tell You

‍

There's a concept that matters enormously for building reliable AI in compliance, and almost nobody talks about it publicly: context engineering.

‍

The term has been circulating since late 2025, but practitioners have been figuring this out for longer. At Unit21, we started experimenting with LLMs in 2022, when GPT-3 first came out. We fed raw transaction data to models to see if they could find fraud. The results were terrible.

‍

But through those experiments, a counterintuitive finding emerged: giving the model less data often produced better results. Models are like people. Give someone too much information and they get confused, conflate things, and don't know where to look. Every model has a context window, and even the best frontier models degrade as they approach the limit.

‍

The goal of context engineering is to optimize what information you provide to the model for the exact decision you want it to make. Too little context and it lacks the information to decide correctly. Too much and performance degrades. But even with exactly the right information, there's always the possibility the model would have found an additional pattern with just a little more data. This is more art than science.

‍

The cooking analogy works here: if you want to teach someone to cook, you teach them about chopping vegetables, selecting fresh produce, choosing proteins. You don't hand them a book on playing the drums. That would be noise context.

‍

Here's the thing most people miss: the LLM itself is fundamentally stateless. It doesn't remember what it did five seconds ago. Everything the system "knows" comes from layers engineered around it: state management, memory, context construction. The real question isn't "how good is your AI model?" but "how good is everything around your model?" The model is the engine, but the harness is the car.

‍

This matters for buyers because the hardest problem in building agentic AI for compliance isn't writing good prompts. It's engineering the right context. If a vendor can't explain how they manage context, how they decide what information goes into each decision, they haven't solved the hard problem.

‍

There are three techniques that matter for AI reliability in regulated environments:

‍

Temperature control. Setting randomness to zero reduces variability. But fundamentally, all LLM output is generative by nature. The term "hallucination" is misleading, because anything generated is generative. The real question is not "did the model hallucinate?" but "does the output match the expectation?"

‍

Eval sets with a golden set. If you have historical data where you know the expected output, you can run the model against it and measure accuracy. The key insight: don't benchmark against average human performance. Benchmark against your best analysts, your golden set. Competitors without years of historical disposition data can't build comparable eval sets. This is a structural advantage.

‍

Structured outputs and code generation. Rather than relying on unstructured text, force responses into structured formats: JSON with specific fields, enum tags, defined schemas. If you can frame your problem as a code generation problem, your chances of success increase dramatically. Detection rules, investigation checklists, and filing templates are all structured problems.

‍

The Missing Piece Nobody Talks About: Network Intelligence

‍

There's a fourth dimension that the ML-vs-GenAI-vs-Agentic taxonomy doesn't capture, and it may be the most important one for buyers evaluating financial crime technology: what happens when intelligence compounds across institutions.

‍

A standalone AI agent, no matter how sophisticated, only knows what it can see within your data. It doesn't know that the person your customer is sending money to was flagged as a scammer at three other institutions last week. It doesn't know that the account receiving funds has been linked to a mule network identified across the banking system. It operates in isolation.

‍

The fraudster is always the recipient of funds. Always. That's the whole goal: they're going to receive the money, and they have to receive it somewhere. If you can see where money flows across a network of institutions (banks, credit unions, fintechs, crypto exchanges), you can flag bad actors before they do anything on your platform.

‍

This is what consortium intelligence does. When a bad actor is identified anywhere in the network, that signal propagates. A fraud pattern seen at one customer trains better detection for all customers. Investigation outcomes at one institution inform rule recommendations across the network. The AI gets smarter with every decision made by every participant.

‍

This creates a flywheel that individual-customer ML models can never match. More customers means better AI. Better AI attracts more customers. The compounding effect is the moat.

‍

Buyers should ask every vendor: does your AI learn from a network, or just from my data? If the answer is "just your data," you're building on an island.

‍

How to Tell What's Real

‍

When a vendor tells you they have "AI" or "agentic AI," here are the questions that cut through the marketing:

‍

Show me an alert that your AI investigated end-to-end. Not summarized. Not scored. Investigated, with data gathering, evidence evaluation, narrative drafting, and a disposition recommendation. If they can't show you a completed investigation package produced by their AI, they don't have agentic AI.

‍

How many alerts has your AI processed autonomously in production? Not in a demo. Not in a sandbox. In production, at real financial institutions, with real regulatory oversight.

‍

Can I test your AI against my historical investigation data before deploying it? In regulated environments, you cannot trust AI with live decisions based on a demo. Any vendor claiming agentic AI for compliance should be able to validate against your known outcomes before going live. This is where eval sets matter: if the vendor doesn't have a methodology for benchmarking against your golden set of historical decisions, they haven't built for regulated environments.

‍

What happens when your AI encounters conflicting evidence? A scoring model returns a number. A workflow automation follows a script. An agentic system reasons through ambiguity and explains its reasoning.

‍

Does your system verify its own work? Real agentic systems include verification loops. Did the sanctions screening actually resolve the match, or skip a step? Does the narrative cite evidence that actually exists in the case file? A system without self-verification is just generating outputs and hoping for the best.

‍

Can I configure different autonomy levels for different queues? Real agentic AI in compliance doesn't ship as "fully autonomous" or "fully manual." It supports progressive autonomy: different queues, different risk appetites, different levels of human review. Start with AI-assisted (the agent recommends, the human decides everything). Move to AI-driven (the agent decides routine cases, the human reviews a sample). Eventually reach autonomous (the agent handles exceptions, the human handles the truly hard cases). If the vendor offers a single toggle, they haven't built for regulated environments.

‍

Does your AI appear in the audit trail? In a real agentic system, the AI is treated like an analyst. It has an identity. It appears in audit logs. Its work is subject to governance controls. If the AI's work is invisible in the audit trail, it's not ready for compliance.

‍

Why This Matters Now

‍

The compliance technology market is at an inflection point. Alert volumes are growing faster than teams can hire. The typical response is to throw more analysts at the problem, but that doesn't scale.

‍

The vendors who built for the last era (batch processing, black-box ML, manual everything) are being replaced. The question isn't whether AI will transform financial crime operations. It's which AI.

‍

Buyers who can't distinguish between ML scoring and agentic investigation will overpay for the wrong technology. Analyst firms that lump all "AI-native" vendors into the same category will mislead the market. And vendors who use buzzwords without proof will eventually be exposed by the ones who ship.

‍

The taxonomy isn't academic. It's operational. ML, GenAI, and Agentic AI are different technologies that do different things. Understanding which is which, and demanding proof of each, is the single most important capability a buyer can develop in 2026.

‍

Don't take anyone's word for it. Not ours, not our competitors'. Ask to see the work.

‍

Kunal Datta is the Chief Product Officer at Unit21, where he leads the product team building AI risk infrastructure for fraud, AML, and compliance. Before Unit21, he led product at Fast and built AI-powered aerial wildfire safety inspection systems at PG&E.

‍

Kunal Datta

Chief Product Officer, Unit21

Kunal Datta is the Chief Product Officer at Unit21. Prior to Unit21, he led the Product team for Checkout at Fast, and prior to that, led the Product teams responsible for automating aerial wildfire safety inspections at Pacific Gas & Electric.

‍

He has a background leading Product teams using AI to automate processes at regulated entities, as well as financial products, machine learning products, web applications, mobile applications, hardware products, and data products. Kunal is a Fulbright Scholar and studied Civil and Environmental Engineering and Music Science Technology at Stanford University.

‍

Learn more about Unit21

Unit21 is the leader in AI Risk Infrastructure, trusted by over 200 customers across 90 countries, including Sallie Mae, Chime, Intuit, and Green Dot. Our platform unifies fraud and AML with agentic AI that executes investigations end-to-end—gathering evidence, drafting narratives, and filing reports—so teams can scale safely without expanding headcount.