The keys to understanding the metrics of Meelo anti-fraud tools
6
Min
•
19.05.2025
Let's continue our immersion behind the scenes of the Meelo solution with a focus on the evaluation metrics used in our anti-fraud tools. While the Gini Index remains a historical and recognized standard in the banking sector, our commitment to ever more efficient and responsible detection pushes us to go further and to integrate the most relevant market metrics. It is with this in mind that we have chosen to adopt, in addition to the Gini, an even finer measure in its probabilistic reading: the Brier Score. Our aim? Offer you an accurate reading of your predictions and increased confidence in your decision-making. Explanations!
Confusion matrix, ROC curve and AUC: the foundations of machine learning evaluation
To assess the performance of our fraud detection tools, we rely on the confusion matrix, from which we build and analyze the ROC curve (Receiver Operating Characteristic) and its associated indicator, theAUC (Area Under the Curve).
La confusion matrix allows us, at first, to visualize the four possible scenarios in a binary classification task.
What is a binary classification task?
This amounts to building a model capable of classifying a situation into two possible categories: “yes” or “no”.
Take the case of COVID-19 testing at the time, thankfully over, of the pandemic. The tests could give four possible scenarios:
- True positive: the person had COVID and the test was positive.
- True negative: the person did not have COVID and the test was negative.
- False positive: the person did not have COVID, but the test was positive.
- False negative: the person had COVID, but the test was negative.
Transposed to the detection of fraud, it is exactly the same principle. We seek to put individuals into the categories: “fraudsters” or “non-fraudsters”, as above with the categories “sick” or “not sick”. And that, while making as few classification errors as possible.
At Meelo, our aim is to :
- detect as many true positives as possible — proven fraudsters;
- seek to minimize false positives — customers wrongly identified as fraudsters;
- and above all, don't let false negatives pass — fraudsters who are not identified as such.
From the confusion matrix, we can calculate key indicators, such as ROC curve (Receiver Operating Characteristic), which graphically illustrates the trade-offs between fraud detection (the true positive rate or correctly detected fraud) and alert errors (the false positive rate, in other words legitimate transactions reported incorrectly) for different decision thresholds.
By varying the decision thresholds or “alert thresholds” (as one could vary the sensitivity of the COVID test to detect the virus), we obtain different points on the curve, reflecting the performance of the model in various scenarios.
THEAUC (Area Under the Curve), or” Area under the curve ” in French, measure the area under the ROC curve. Being able to give a value between 0.5 and 1, the AUC represents the overall ability of the model to distinguish fraud from normal transactions, taking into account all decision thresholds. The closer the AUC is to 1, the better the performance of the model. An AUC of 0.5 would indicate that the model would do no better than chance, such as a simple coin toss.
From AUC to Gini, there is only one step...
The Gini coefficient is derived from AUC. While AUC takes a value between 0.5 and 1, Gini normalizes this measurement to vary between 0 and 1, which is often more intuitive. Mathematically, the Gini is calculated, from the AUC, according to the formula:
Gini = 2 × AUC − 1
The Gini therefore varies between 0 and 1 (or 0% and 100%). The higher it is, the better the discriminatory power of the model. Like the AUC, a Gini coefficient of 0.5 (50%) corresponds to a random model (coin or coin), while a Gini of 1 (100%) represents a perfect model.
The Gini coefficient is a widely adopted metric, especially in the banking sector, to assess the overall effectiveness of a scoring model.
However, despite its popularity, it has a crucial limitation. It focuses primarily on the overall ability of the model to discriminate between “good” and “bad” profiles, without taking into account the imbalance between classes, the distribution of probability scores, or the confidence that can be placed in each individual prediction.
In cases where the classes are highly unbalanced (for example, 99% negative and 1% positive), the Gini may overestimate the performance of the model. This is precisely what we observe in most of the problems encountered by our customers. Fortunately, fraud rates are low: often around 1 to 2% of transactions.
Gini therefore gives a good overall indication of the effectiveness of a model, but is limited to a raw vision of performance, without taking into account the explainability of the predictions.
This is why Meelo has chosen to integrate another metric into its tools in addition to Gini: the Brier Score.
To Gini and beyond... thanks to the Brier Score!
Unlike Gini, which measures the model's ability to separate good and bad profiles, the Brier Score Evaluate whether our probability scores are close to reality. For example, if our model predicts an 80% chance of fraud, the Brier Score checks whether, on average, this type of case is actually fraudulent in 8 out of 10 cases.
Let's imagine two models with an equivalent Gini coefficient. The former assigns very clear scores (close to 0 or 100%), while the latter focuses the majority of its predictions around a gray area (between, for example, between 40 and 60%). Although their capacity for global discrimination may be comparable, the first model inspires more confidence because it seems more reliable in its decisions.
It is precisely this notion of reliability that the Brier Score measures by evaluating the difference between the probability predicted by the model (80% risk for example) and the observed result (0 or 1). The smaller this gap, the more consistent and closer the predictions are to reality.
What are the benefits of the Brier Score?
At Meelo, we are convinced of the Relevance of the Brier Score. Thanks to this powerful metric:
- our decisions are more informed, because we don't just separate the good ones from the bad ones, we assess the certainty of each prediction;
- we generate fewer “gray areas” ”: optimizing the Brier Score pushes our models to generate more extreme scores, reducing risks and additional checks, which are synonymous with costs and friction in the customer journey;
- our approach is even more ethical : conservative policies are avoided as much as possible thanks to sharper decisions based on a real conviction of the risk;
- our users are more satisfied : by offering a complementary perspective to Gini, the Brier Score allows our customers to better understand the reliability and distribution of fraud scores.
-
And the Brier Skill Score, Kézaco?
To facilitate the interpretation of Brier Score, we often use the Brier Skill Score (LOW). The BSS measures whether our prediction model is better than a simple reference model. The higher the score, the more accurate our model is in its predictions. This is our way of validating that our model really brings added value to our customers.
Optimizing evaluation metrics: the results speak for themselves
During the re-training of our models, the integration of the Brier Score made it possible to significantly improve our performance. At the same data, the score went from 35 to just over 60 points, i.e. A gain of 30 points.
So we're identifying more fraud in a more formal way. This approach guarantees the reliability of our predictions and reduces the risk of the model behaving in an unstable manner in the face of atypical profiles.
At Meelo, we are convinced that the performance of a fraud detection model is not limited to its ability to discriminate globally. By going beyond standard indicators, we build solid trust through truly informed fraud detection and we provide our customers with a solution that is both efficient and responsible.
KYC, KYB, Solvency
For an instant, secure and responsible relationship

.jpg)
.jpg)