
In fraud and AML, "AI" and "machine learning" get used interchangeably — often in the same question. They are related, though not the same thing. Conflating the two leads to the incorrect evaluation criteria, the incorrect compliance questions, and incorrect expectations about how a product improves over time.
This post clarifies what each term usually means today, how they differ technically, and how they work together in modern compliance and fraud systems.
When most people say AI in 2026, they mean generative AI:
Generative AI generates content. It does not, by default, return a single fixed numeric score from a trained function. If you hand it a date of birth and an address, it will not automatically output "risk score: 0.73" unless you have explicitly designed the workflow — via prompts, orchestration, and surrounding tools — to produce this kind of result.
When people say machine learning (ML), they are usually talking about a more traditional research and engineering paradigm:
Classic fraud/AML example: Given a customer's date of birth, address, and transaction history, an ML model returns a risk score or a fraud probability. The output space is narrow and well-defined.
ML is about prediction from patterns in data. Generative AI is about producing language and reasoning from context and instructions.
Agentic AI is mentioned in the table below so folks are aware of its existence. We will not cover Agentic AI in this post — generally speaking though Agentic AI is built on top of Generative AI and often also uses signals from Machine Learning.
When evaluating AI products today folks often apply machine learning evaluation frameworks to generative AI systems. The vocabulary overlaps — "training," "learning," "model," "drift" — but the underlying mechanics differ.
This is not wrong so much as misaligned. Model validation, as practiced in regulated financial services, was built around deterministic, score-based models with documented training data, performance metrics, and retraining cycles. Generative AI products operate differently. Asking the same questions without reframing leads to frustration on both sides.
See “II. PURPOSE AND SCOPE” in the OCC Bulletin 2026-13A PDF, linked from the OCC Bulletin 2026-13 page.
For the purposes of this guidance, the term “model” refers to a complex quantitativemethod, system, or approach that applies statistical, economic, or financial theories to process input data into quantitative estimates. The term “model” in this guidance excludes simple arithmetic calculations, such as those found within spreadsheets, as well as deterministic rule based processes and software where there are no statistical, economic, or financial theories underpinning their design or use.³
³ Generative AI and agentic AI models are novel and rapidly evolving. As such, they are not within the scope of this guidance. Nonetheless, a banking organization’s risk management and governance practices should guide the determination of appropriate governance and controls for any tools, processes, or systems not covered in this document. However, the principles described in this guidance apply to traditional statistical and quantitative models and non-generative, non-agentic AI models.
The goal is not to abandon rigor. It is to ask the right questions for the right technology.
Namely Generative AI does not currently fall under the OCCs guidance with regard to “Model Risk Management”. They recognize the technology is different. Source
The OCC guidance is still being determined for GenAI and Agentic Systems. The Spring 2026 Semiannual has an entire section on “Innovation” and “Artificial Intelligence”. Specifically called out is need to have a different evaluation criteria for GenAI. The generally messaging here is the technology is different and leads to better and higher quality outcomes due to the innovation it affords. The section on AI closes with a signal pointing towards a desire to allow generative AI and to ensure there’s a proper evaluation criteria given.
The OCC is also actively reviewing supervisory expectations, guidance, and regulations to ensure that innovative opportunities are available to all OCC-supervised banks, rather thanonly a few, that wish to take advantage of AI. In doing so, the OCC seeks to support community banks that leverage third-party technology and to right-size supervisory expectations.
In ML, training means running an algorithm over a dataset until the model learns a stable mapping from inputs to outputs.
In generative AI products, when someone asks "Does your AI train on my data?" they are often importing ML assumptions. Somes notes:
That is a configuration and feedback loop, not classical training. It is also faster: you can change behavior on demand without a full retrain cycle. And for improvements, no customer data needs to be retained. No retaining is necessary at the GenAI level because of how the technology combines the data, the prompts, and the LLM at inference to produce the output (e.g. a narrative summary).
When people ask about "training," it’s best to clarify what they mean. Often people are asking about model validation and change control — legitimate concerns, framed in ML language. Thankfully there are multiple standards that already exist to help validate and ensure GenAI is being used responsibly. Here are a few frameworks and regulations to help evaluate GenAI:
At the lowest level — one input, one output — a purpose-built ML model is almost always faster than an LLM call. ML models return a score from a narrow function. LLMs reason over context and generate text.
At the highest level — building and deploying a new capability — generative AI is dramatically faster:
In fraud and AML, you need both: fast, reliable scores where the output space is narrow, and rich analysis where the output is narrative, contextual, and investigative.
Modern compliance and fraud platforms rarely choose one or the other. A generative AI system often utilizes machine learning signals as inputs.
Examples:
Generative AI then orchestrates those signals: gathers context, runs parallel analyses, summarizes findings, and produces an investigation narrative a human can review.
The ML model returns the score. The AI agent explains why it matters and what to do next.
Both ML and generative AI systems use orchestration, but the harness — the layer that wires data, models, prompts, and logic together — is far more important in generative AI.
ML harness (simple):
Transaction + metadata → single model → fraud probability
Generative AI harness (rich):
Alert + transactions + ML scores + prior cases + policy context → parallel tasks (SQL aggregations, ML calls, document review) → LLM synthesis → structured narrative + recommended disposition
A well-designed harness:
Structuring example: The harness gathers 100 transactions, groups and sums them (pure math). It passes those aggregates to the LLM along with context on what structuring means. The LLM does not do the arithmetic — the harness does. The LLM interprets the results and explains them in plain language.
This is why "AI vs ML" is the wrong framing for evaluation; also why it’s important to understand how they are different. More important is the question: how does the harness work?
Understanding the harness, one’s ability to explain how it works, and why a given harness is better: these three things are how you’ll know a given harness is the right GenAI for you.
When someones asks an ML-shaped question about an AI product, consider reframing: