Affiliations: [a] LRIA/Computer Science Department, University of Sciences and Technology Houari Boumediene, USTHB - BP 32 El-Alia, Beb-Ezzoaur, Algiers, Algeria E-mails: [email protected], [email protected] | [b] Department of Accounting, Faculty of Economics and Administrative Sciences, The Hashemite University, Zarqa, Jordan E-mail: [email protected]
Abstract: Machine learning techniques have been used successfully in several areas such as banking and finance. These techniques are used mainly for prediction, classification and partitioning data into different groups according to a certain common characteristic. In this work, we are interested in machine learning techniques for credit scoring and bankruptcy prediction in finance and banking. We evaluate and compare a range of machine learning techniques on several datasets issued from banks and financial institutions where the aim is to select the most appropriate methods suitable for each dataset. We use several metrics to evaluate the performance of the obtained models. The empirical studies are conducted on German, Australian, Japanese, Polish, Indian Qualitative Bankruptcy and Taiwan datasets. Also, we consider the huge “Give Me Some Credit dataset”. The machine learning methods produce scores for applicants and companies and help a lot in the decision making. In other words, these methods permit us to distinguish between bad and good applicants or companies. The numerical study shows that there is no method able to consistently outperform the others on all the datasets. Also, there are significant differences between the studied methods on some datasets. For German and Give Me Some Credit datasets, the Bayes net method is able to produce good scores compared to the others studied methods. The LogitBoost method is competitive on both Polish and Australian datasets, while AdaBoost method is most appropriate for Japanese dataset. For Taiwan dataset, Random Forest method gives the best results compared to the other considered techniques. However, on Indian Qualitative Bankruptcy dataset, almost the methods are comparable due to the nature of this dataset.