Machine Learning Alerts: How to Increase Operational Efficiency with Predictive Scoring
June 9, 2022
Subscribe to our Blog!
Please fill out the form below:
An overwhelming majority of today’s Risk & Fraud teams suffer from a growing backlog of alerts. Unfortunately, what they have yet to uncover is that the majority of those alerts are false positives.
With growing alert volumes, regulatory updates, and ever-changing markets, Risk & Fraud teams are busier than ever. However, being busy doesn’t necessarily equate to being productive.
On the contrary, teams that burn through valuable resources chasing phantom fraudsters instead of addressing real threats fall victim to not meeting regulatory requirements in a timely manner, which is bad for business.
To help solve for this, Unit21 has created Alert Scores - to help focus investigator time on the alerts that matter. Here, we’ll cover how alert scoring works, the machine learning model we’ve deployed and why we chose it, and how alert scoring can make your risk and compliance program more effective.
The alert score can then be used to triage alerts using the Unit21 queueing system, ensuring severe alerts are handled promptly and by the correct investigator.
Our machine learning model processes each alert generated by a customer’s rule in the Unit21 system to produce an Alert Score. This score, ranging from 0 to 100, ranks how likely the alert will result in a case requiring investigation. The score does not reflect a percentage.
The score is also represented visually using red and blue colors:
Alert Scores enable agents to investigate riskier alerts first. In addition, agents can automatically dismiss alerts with low scores or send them to newer agents for educational purposes.
Machine learning (ML) allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so.Instead, ML uses various algorithms that iteratively learn from data to improve, describe, and predict outcomes.
A machine learning model is the output generated when you train your machine learning algorithm with data. As the algorithms ingest more training data, it produces more precise models based on that data.
After training, the predictive algorithm will create a predictive classifiermodel. You will be given an output when you provide the model with an input. Essentially, models are used to make a prediction or classification.
In the case of Unit21, the Alert Score model is used to classify alerts by severity (based on your organization’s typical alert outcomes (likeliness of SAR filing and/or case investigation).
How Did We Build the ML Model?
Unit21’s machine learning algorithm is called a random forest classifier which is built in scikit-learn and coded in Python.
While we initially considered other algorithms such as logistic regression, XGBoost, and recurrent neural networks (RNN), we chose random forestsbecause they have been applied successfully in various industries.
How Do Random Forest Algorithms Work?
Random forest algorithms are known for their fast training time and performance. They consist of many individual decision trees that operate as an ensemble.
In a single decision tree, features of the data are split into nodes that try to separate the data into their correct classes. Each individual tree in a random forest has been generated on a different subset of features and will spit out its own class prediction.
As individual trees in the forest may spit out different class predictions, the class with the most votes becomes the model’s prediction.
In the example below, the random forest model has been trained to find different types of fruits (apples, bananas, strawberries, pears, and pineapples). Here, it classifies that the input instance is an apple after majority voting occurs from the n decision trees:
How We Train an Alert Score Model
To train an Alert Score model, we use data from your organization’s past alerts that have been known to produce a case or SAR. Because each organization is distinct, we generate a unique model for each customer.
Unit21 carefully gathers input data from past alerts and investigations, including customer and transactional information. There are thousands of data points to choose from; however, some are more important than others. For example, a person’s ‘first name’ is unlikely to be beneficial for training a model, but the ‘country of origin’ might be helpful. These data points are referred to as features.
We curate hundreds of features, including ‘credit card type,’ ‘number of transactions,’ ‘transaction velocity,’ the ‘age of the customer’s account,’ ‘email address,’ ‘IP address,’ ‘time between transactions’ and more to train models.
Features are frequently updated and evaluated based on their perceived importance to the classification and model performance.
Here is a sample set of the most important features for some organizations in the Unit21 system:
Over the lifetime of your account, your model’s performance is continually monitored. As features change importance and new features emerge, Unit21 re-trains your model using your latest data.
In summary, Unit21 pulls the relevant customer attributes and transaction data, applies it to the Alert Score model, and generates a score representing the likelihood that a SAR will be generated from the alert.
The Alert score model can also be optimized to detect the different types of undesirable behavior viewed in previously reviewed alerts.
How Accurate is Our Model?
An inaccurate model is futile. In the ML space, a common choice to measure the accuracy of models making a binary prediction is ROC-AUC.
A ROC curve, also known as Receiver Operating Characteristics Curve, is a metric used to measure the performance of a classifier model. The ROC curve depicts the rate of true positives (TP) concerning the rate of false positives (FP), therefore highlighting the sensitivity of the classifier model.
The ROC curve is applied to the random forest’s predicted probability. The curve arises from sweeping all possible thresholds over the probability space and plotting the associated true positive rate (TPR) and false-positive rate (FPR) values for each threshold.
An ideal classifier will have a ROC where the graph would hit a TPR of 100% with a 0% FPR.
Area Under Curve or AUC is one of the most popular metrics for model evaluation. AUC measures the entire two-dimensional area present underneath the entire ROC curve. AUC of a classifier equals the probability that the classifier will rank a randomly chosen positive example higher than that of a randomly chosen negative example.
An excellent model has an AUC close to 1. This tells that it has a good measure of separability (i.e., how well the model can distinguish between classes). On the other hand, a poorly performing model will have an AUC close to 0.5, which indicates it is doing no better than randomly guessing.
Here is a sample set of ROC curvesfor some organizations in the Unit21 system:
A specific cutoff is chosen on the random forest’s probability to flag the highest risk alerts.
Predicted possibilities greater than this cutoff are judged high risk, which is how we determine what Alert Scores to color code blue and red in the UI. High-risk alerts are colored red, while all other alerts are colored blue.
There is a trade-off regarding where to place the cutoff value. As the cutoff point decreases, we get more true positives (and our sensitivity or TPR increases) and more false positives (and our specificity or TNR suffers).
Giving equal weight to sensitivity and specificity, Youden’s J statistic is a common way of choosing the optimal cutoff, which maximizes the number of correctly classified cases.
Youden’s cutoff occurs when the vertical distance between the ROC curve and the diagonal chance line is maximized.
For example, in the figure above, the first org has an AUC of 0.9146, and the midnight blue vertical bar shows that Youden’s J chooses a cutoff where the true positive rate is already approaching 90%.
Note in all figures that increasing the cutoff beyond Youden’s J has diminishing returns on improving the TPR while also significantly increasing the FPR.
Why Predictive Alert Scoring Matters: Final Thoughts
The goal of this feature is not to replace agents but to help surface alerts that are more likely fraudulent, increasing organizational efficiency. However, as Unit21 is a flag-and-review system, rules still need to be in place to generate alerts, and agents must investigate and resolve (disposition) the alerts.
This is also required to train the models and maintain their accuracy. Not to mention that typical age-based alert ordering for investigation triage leads to delays in filing SARs and reduces overall effectiveness.
By producing an Alert Score for each alert, agents can:
Make decisions quickly
Reduce overall risk exposure by prioritizing alerts that identify criminal behavior faster