No. 547 - Quality checks on granular banking data: an experimental approach based on machine learning?

Vai alla versione italiana Site Search

by Fabio Zambuto, Maria Rosaria Buzzi, Giuseppe Costanzo, Marco Di Lucido, Barbara La Ganga, Pasquale Maddaloni, Fabio Papale and Emiliano SveziaMarch 2020

The study proposes a new methodology, based on the supervised learning algorithm known as Quantile Regression Forests, for the automatic detection of potential outliers in data reported to the Bank of Italy by banking and financial intermediaries. The empirical analysis focuses on granular data on debit cards that are gathered within the statistical data collection on payment services; for such information, the data quality management process is challenging and its maintenance particularly burdensome.

The approach makes it possible to automatically select acceptance regions for the data reported by the intermediaries by estimating specific thresholds that are suited to the characteristics of the reporting agents; such thresholds are updated as new data are collected. The empirical analysis shows that the proposed procedure is able to detect additional anomalies compared with the current system of quality checks; a cross-check of anomalies with reporting agents indicates that the share of actual outliers corresponding to true reporting errors as a percentage of the total is high, and this confirms the reliability of the algorithm.