No. 666 - A decision-making rule to detect insufficient data qualityan application of statistical learning techniques to the non-performing loans banking data

Vai alla versione italiana Site Search

by Barbara La Ganga, Paolo Cimbali, Marco De Leonardis, Alessio Fiume, Luciana Meoli and Marco OrlandiFebruary 2022

The study presents a methodology for assessing the overall quality of the revisions applied to Non-Performing loans data that the banks send to the Bank of Italy. The approach is based on a synthetic data quality indicator, computed through a machine learning technique applied to past evidence on data quality management activity on the Non-Performing Loans dataset and it allows to distinguish the cases where the corrections applied to the original dataset improve its overall quality from those where the revisions (unexpectedly) make it worse.

The proposed methodology considers different metrics that influence the overall quality of the dataset, specifically the number of potential outliers, their degree of severity and the probability of the correctness of the underlying data, estimated using a supervised statistical learning technique. Compared to the approach currently used in the Bank of Italy, the new methodology is able to identify more precisely the cases in which the overall quality of the data worsens between two consecutive data submissions.

Full text