No. 689 - Stacking machine-learning models for anomaly detection: comparing AnaCredit to other banking datasets

Vai alla versione italiana Site Search

by Pasquale Maddaloni, Davide Nicola Continanza, Andrea del Monaco, Daniele Figoli, Marco di Lucido, Filippo Quarta and Giuseppe TurturielloApril 2022

This study addresses the problem of anomaly detection in banking loans collected in AnaCredit. The main idea is the comparison between such information and the more aggregated credit data available in the other datasets, namely the 'Balance Sheet Items' and the 'Financial Reporting'. The final output is given by stacking the predictions of several machine-learning algorithms.

This methodology makes it possible to identify, more accurately than that of each algorithm alone, the potential outliers present in the AnaCredit dataset. Furthermore, it makes it easy for reporting agents to analyse the reason for the anomaly since the check is ultimately a reconciliation exercise between AnaCredit and a benchmark source.