In the context of the data quality management of supervisory banking data, the Bank of Italy receives a significant number of data reports at various intervals from Italian banks to which, if necessary, it responds with remarks on the quality of the data. This process can lead to a bank confirming or revising the data previously transmitted. This paper proposes an innovative methodology, based on text mining and machine learning techniques, for the automatic processing of banks' confirmations of their data.
A classification model is employed to predict whether data confirmations received from banks should be accepted or rejected based on the reasons given by the reporting banks, the characteristics of the validation quality checks, and reporting behaviour across the banking system. The empirical findings show that the methodology predicts the correct decisions on recurrent data confirmations; the performance of the proposed model is comparable to that of data managers currently engaged in data analysis.