Abstract: Although many efforts have been made towards assessing credit risk, and various models and methods for credit assessment have been developed, most of them are based on financial data or stock price and some investigation data provided by special inquiry agencies. Because most of minor and small businesses do not disclose their financial information, it is almost impossible to assess their credit using models or methods published so far. Here we have proposed a new approach to assess the customers’ credit only based on daily transaction data such as sales, payments by customers, amount of overdue payment, etc. This paper proposes a credit assessment system using bagging method. It aims to deal with the issue that the number of unsound customers is much less than that of healthy ones and improve the ability for identifying unsound customers. The performance and effectiveness of the proposed system is confirmed by applying it to the real problems.
Key words: Ensembles Bagging; Customer Evaluation; Credit Risk; Credit Scoring
doi:10.3969/j.issn.1673-0194.2009.15.035
CLC number: TP224;F275Article character:AArticle ID:1673-0194(2009)15-0115-05
1 Introduction
In today’s increasingly competitive business environment, risk management has become an important topic. Credit risk is one of management risks being faced frequently and most simply defined as the potential that counterparty will fail to meet its obligations in accordance with agreed terms. Because there are many types of counterparties from individuals to sovereign governments and many different types of obligations from auto loans to derivatives transactions credit risk takes many forms. There have been significant prior researches on credit analysis or credit evaluation. The models and methodologies published so far for credit risk assessment generally fall into two categories: default models and credit scoring models. Default models assess the likelihood of default by an obligor. Credit scoring models are used to assess the credit quality of counterparty.
The application of statistical techniques to credit analysis started in the 1960’s with the development of computers. Beaver[1] is one of the first researchers to introduced discriminant analysis (DA) to study bankruptcy prediction. From the 1980’s, the DA method was replaced by other statistical techniques such as logit analysis, probit analysis, multidimensional scaling and so on[2][3]. Artificial Intelligence (AI) techniques, particularly rule-based expert systems, case-based reasoning (CBR) systems and machine learning techniques such as neural networks have been used to credit rating or bankruptcy analysis. Recently, researchers have proposed the hybrid data mining approach in the design of an effective credit scoring model[4-6].
Credit scoring models vary regarding the type and quantity of the data needed for decision making, most of prior studies are limited to be applied mainly in financial community companies such as commercial banks where the customers are usually required to submit their financial data and/or others. However, many of companies of non-financial community cannot require financial data and/or others from their customers. It is difficult in particular to obtain the customers’ data in the situation where the customers are small businesses without disclosing financial information. Therefore, most of models and methods for credit assessment, which are suitable and effective to commercial banks or large business, cannot always be applied to small business.
To deal with the customers’ credit assessment problem in a small company, we have proposed some new approaches to assess the customers’ credit only based on daily transaction data such as sales, payments by customers, amount of overdue payment, etc[7] [8]. Because the data can be extracted almost automatically from the database of the management information system, these approaches are suitable to be applied to many of organizations where the customers do not disclose their financial data and have an advantage of a low cost to collect data over other ones published so far.
However, some new issues have to be addressed. One of them is that daily transaction data contains usually a large amount of transaction records and so it is necessary to reduce the data size to improve the learning efficiency. Another issue is how to improve the ability for identifying the unsound customers. Because the number of unsound customers is much less than that of healthy ones, it is more important to indentify unsound customers more accurately than improving the overall accuracy. In order to deal with these issues, this paper proposes a credit assessment system using bagging method. The architecture of the system is described by giving the procedure for generating multiple versions of a customer’s credit assessment through multivariate discriminant analysis (MDA) and aggregating these to get a final assessment through a plurality vote. The performance and effectiveness of the proposed system is confirmed by applying it to the real problems of the company.
Table 1 Credit scores
ScoreCustomers
1A healthy customer for which all orders are accepted.
2A customer for which orders are accepted but limited to a given amount.
3A customer for which orders are accepted only in cash sale.
4An unsound customer for which all of orders are rejected.
2 Credit Assessment Problem in a Small Company
This paper considers the credit assessment problem in a small company that the main business is selling school uniforms and accessories at wholesale. There are 20 employees in the company, and the annual sale is about 600 million Japanese yen. Orders come from about 800 customers that are classified into three types: retailer, school and others. The customers’ credit has been assessed through a four-grade credit score as shown in Table 1.
Because most of the customers are minor small businesses without disclosure of financial information, it is almost impossible to obtain their financial data. It is also difficult to frequently ask an agency for evaluating customers’ credit due to limited budget. For these reasons, it is obviously preferable to develop a system that be able to assess the customers’ credit only based on daily transaction data.
Figure 1 Scheme of credit assessment system using bagging
3 Features Data Extraction
Considering the available data that can be obtained from the company, we collected eight basic features and represent customer Ci (i=1,2, ,n) by the following data structure:
Ci: (xi1, xi2, xi3, , xi8)(1)
where n is the number of customers, xij (j=1,2, ,8) are the features of customer Ci defined as the following:
xi1, xi2: 0-1 variables representing the type of customer Ci as shown in Table 2.
Table 2 0-1 Variables xi1 and xi2
Type of customers xi1xi2
retailer00
school01
other10
xi3: average amount of overdue payment in the year considered.
xi4: maximum overdue days for all of overdue payment in the year considered.
xi5: number of times that overdue payment occurs in the year considered.
xi6: total sales in the year considered.
xi7: rate of the average amount of overdue payment to the total sales, i.e. xi7 =xi3 / xi6.
xi8: number of transaction months that any order from the customer is fulfilled in the year considered.
4 Credit Assessment System Using Bagging
4.1 System scheme
Bagging, which stands for bootstrap aggregating, is one of the earliest and perhaps the simplest ensemble based learning algorithms, with a surprisingly good performance [9]. Diversity of classifiers in bagging is obtained by using bootstrapped replicas of the training data. That is, different training data subsets are randomly drawn with replacement from the entire training dataset. Each training data subset is used to train a different classifier of the same type. Individual classifiers are then combined by taking a simple majority vote of their decisions.
Here the credit assessment system using bagging is proposed and showed Figure 1.
4.2 Procedure for credit assessment
When assessing customers’ credit, the number of healthy customers is usually much larger than that of unsound ones. If the entire dataset of all customers is used as the training dataset, the features of healthy customers are often over-learned but the unsound ones are frequently under learned. Practically, it is more important to identify unsound customers than doing healthy ones in order to control credit risk as low as possible. Considering this aspect, a revision is made in the sampling method of the standard bagging algorithm and bagging procedure for credit assessment is proposed as follows.
[Step 1] Data preparation
The entire dataset is S={ (Ci, csi) | i=1,2, ,n }, where Ci is the features data of customers Ci defined in equation (1) and csi (csi∈{1,2,3,4}) is its credit score. Divide S into two sub dataset S1 and S2 as:
S1={ (Ci, csi) | csi =1, i=1,2, ,n } (2)
S2={ (Ci, csi) | csi >1, i=1,2, ,n } (3)
S1 means the subset of healthy customers and S2 represents the subset of unsound customers. The sizes of dataset S1 and S2 are n1 and n2 respectively.
[Step 2] Sampling with replacement (bootstrap)
Let N be the number of bootstrapped replicas, and build the replica Bk (k=1,2,…,N) by drawing randomly p customers and q customers-with replacement-from the dataset S1 and the dataset S2 respectively.
[Step 3] Weak learning
Using replica Bk (k=1,2, ,N) as the training set that Ci be the input variables and csi be the class label, the Classifier Hk can be constructed by multivariate discriminant analysis, which consists of four discriminant functions. The ensemble E can be obtained as E={ Hk | k=1,2, ,N }.
[Step 4] Majority voting
Given a target customer NC, its features data can be collected and denoted as:
NC: (z1, z2, …, zm)(4)
where z1, z2,…, zm have the same definitions as xi1, xi2,…,xim. Then its credit score cf can be decided by the following steps:
[Step 4.1] Let Gj (j=1,2,3,4) be the class label that is corresponding to the credit score of j. Applying the Classifier Hk to the target customer NC, the class label of NC can be given as cfk (cfk∈{G1,G2,G3,G4}).
[Step 4.2] Let νkj=1,if cfk=Gj0, ptherwise and νj=∑Nk=1νkj, the credit score cf is chosen by taking a simple majority vote as:
cf=Argmax {vj, j=1,2,3,4}(5)
5 Experiment and Discussion
To investigate the performance and effectiveness of the proposed system, we apply the proposed system to the real credit assessing problems of the company. The features data of the customers in 2001 financial year and their credit scores given by financial managers of the company are collected as shown in Table 3.
Table 3 Features data of 498 customers
FeaturesMeanStandard deviationNumber of customers
xi3196 171762 532Credit scoreNumber
xi456.3773.301474
xi52.923.0622
xi61 201 3903 706 05732
xi70.220.37420
xi85.564.14Total498
Let p=24, q=24 and N=5, 10, 24, we apply the proposed system to classify the customers of Table 3, and further the classification results are compared with that given by the financial managers of the company. The comparison results are shown in Table 4. The credit scores for the customers with score of 2, 3 and 4 provided by the system are 100%, 50% and 90% respectively in agreement with the judgments of the financial managers of the company, and the hit rate is not sensitive to N (the number of classifier).
The classification result based on all of customers’ data is also obtained as shown in Table 5. Comparing the hit rates of Table 4 and Table 5, it is obvious that the proposed system can classify unsound customers more effectively than ordinary discriminant analysis. But the system gave lower hit rate for classifying the healthy customers.
6 Concluding Remarks
This paper dealt with customers’ credit scoring problems in a small company and intended to assess the customer’s credit based only on daily transaction data such as sales, payments by customers, amount of overdue payment, etc. A credit assessment system using bagging method was proposed and the performance and effectiveness is confirmed by applying it to the real problems of the company. The experiment results showed that the system can classify unsound customers more effectively than ordinary discriminant analysis.
Breiman[10] reported that linear discriminant analysis is of stable classifiers and has low variance, but can have high bias and therefore bagging cannot work well. It can be expected that the performance of the system can be further improved by introducing appropriate unstable classifiers.
Acknowledgement
This work was partly supported by grant-in-aid for scientific research (C) from the japan society for the promotion of science under grant no. 19530324.
References
[1] W Beaver. Financial Ratios as Predictor of Failure[J].Journal of Accounting Research (Supplement), 1966, 4: 71-111.
[2] L H Ederington. Classification Models and Bond Ratings[J]. The Financial Review, 1985, 20(4):237-262.
[3] M C Mar, C G Apellaniz, C S Cinca.A Multivariate Study of Spanish Bond Ratings[J]. Omega, 1996, 24(4): 451-462, 1996.
[4] D West.Neural Network Credit Scoring Models[J]. Computers and Operations Research, 2000,27(11/12): 1131-1152.
[5] B Baesens, et al.Benchmarking State-of-the-art Classification Algorithms for Credit Scoring[J]. Journal of the Operational Research Society, 2003, 54(6): 627-635.
[6] N C Hsieh. Hybrid Mining Approach in the Design of Credit Scoring Models[J]. Expert Systems with Applications, 2005, 28(4): 655-665.
[7] Y DONG. A Case Based Reasoning System for Evaluating Customer Credit[J]. Journal of Japan Industrial Management Association, 2006, 57(2): 144-152 (In Japanese language).
[8] Y DONG. Development of a Customer Credit Evaluation System via Case-based Reasoning Approach[J]. Asia-Pacific Journal of Industrial Management, 2008, 1(1): 1-7.
[9] L Breiman. Bagging predictors[J]. Machine Learning, 1996, 24(2): 123-140.
[10] L Breiman. Bias, Variance, and Arcing Classifiers[R]. Technical Report 460, Statistics Department, University of California, 1996: 1-25.