This work compares the accuracy of statistical models usually employed to forecast corporate defaults, such as the logit regression, with a set of machine learning models, such as random forest and gradient boosted trees.
Machine learning models provide substantial gains in forecasting accuracy, relative to statistical models. This advantage is substantial when only a limited information set is available, such as financial ratios or geo-sectoral information.
Using a comparative static exercise, we evaluate the consequences of employing estimated probabilities of default on the allocation of credit. Results shows that machine learning credit ratings would imply lower credit losses for lenders and an increase in the overall supply of credit.