代碼詳解：準(zhǔn)確率驚人！用Credit R創(chuàng)建信用風(fēng)險(xiǎn)評(píng)分模型

昵稱(chēng)16619343 2019-03-25

展開(kāi)全文

機(jī)器學(xué)習(xí)正影響著許多不同的行業(yè)，金融業(yè)是受其影響最大的行業(yè)之一。

欺詐檢測(cè)、客戶(hù)細(xì)分、員工或客戶(hù)保留等功能是機(jī)器學(xué)習(xí)的主要目標(biāo)，本文的關(guān)注點(diǎn)為信用風(fēng)險(xiǎn)評(píng)分。

信用評(píng)分是貸方和金融機(jī)構(gòu)為獲取個(gè)人信譽(yù)而進(jìn)行的統(tǒng)計(jì)分析。在所有評(píng)判標(biāo)準(zhǔn)中，貸方最終通過(guò)信用評(píng)分來(lái)決定是否延長(zhǎng)或拒絕信貸。
——Investopedia

如何才能讓從事該領(lǐng)域的專(zhuān)業(yè)人員更加容易地應(yīng)對(duì)工作？軟件包Credit R就這樣誕生了！它允許你在機(jī)器學(xué)習(xí)應(yīng)用程序之前輕松創(chuàng)建信用風(fēng)險(xiǎn)評(píng)分的基本模型。此外，它還包含一些可用于驗(yàn)證評(píng)分進(jìn)程的函數(shù)。

該軟件包旨在促進(jìn)變量分析、變量選擇、模型開(kāi)發(fā)、模型校準(zhǔn)、評(píng)定量表開(kāi)發(fā)和模型驗(yàn)證等方法的應(yīng)用。通過(guò)已定義的函數(shù)，這些方法可以快速應(yīng)用于所有建模數(shù)據(jù)或特定變量中。

在本文中，我們首先將了解Credit R軟件包的細(xì)節(jié)。然后，我們將列舉一個(gè)使用Credit R軟件包的完整例子，深入了解R軟件包。

這個(gè)軟件包是為信貸風(fēng)險(xiǎn)專(zhuān)業(yè)人士提供的。使用該軟件包需要具備信貸風(fēng)險(xiǎn)評(píng)分方法的基本知識(shí)。

1. 為什么要使用Credit R？

2. 開(kāi)始使用Credit R

3. Credit R中的函數(shù)列表

4. Credit R軟件包的應(yīng)用

1. 為什么要使用Credit R?

隨著信貸領(lǐng)域機(jī)器學(xué)習(xí)模型的需求不斷增加，人們對(duì)信貸風(fēng)險(xiǎn)模型的認(rèn)識(shí)正在迅速轉(zhuǎn)變。然而，許多管理者對(duì)過(guò)渡到機(jī)器學(xué)習(xí)技術(shù)仍然非常謹(jǐn)慎。因此，一個(gè)可能的推測(cè)是，在這個(gè)轉(zhuǎn)換階段，機(jī)器學(xué)習(xí)算法將與傳統(tǒng)方法并行。

一旦確定機(jī)器學(xué)習(xí)算法，在挑戰(zhàn)該領(lǐng)域傳統(tǒng)方法的同時(shí)，也產(chǎn)生了比傳統(tǒng)方法更有效的結(jié)果，這樣就可以獲得監(jiān)管者的信任。此外，解釋機(jī)器學(xué)習(xí)算法的新方法可能有助于創(chuàng)建更透明的學(xué)習(xí)過(guò)程。

Credit R軟件包提供了自動(dòng)使用傳統(tǒng)方法的可能性，也為傳統(tǒng)和機(jī)器學(xué)習(xí)模型的驗(yàn)證提供了可能性。

2. 開(kāi)始使用 Credit R

要安裝credit R軟件包，你首先需要安裝devtools軟件包?？梢酝ㄟ^(guò)運(yùn)行以下代碼來(lái)安裝devtools軟件包：

install.packages('devtools', dependencies = TRUE)

可以使用devtools軟件包中的“install_github”函數(shù)安裝creditR軟件包：

library(devtools)

devtools::install_github('ayhandis/creditR)

library(creditR)

3. Credit R中的函數(shù)列表

下面列出了該軟件包中可用的函數(shù)。

ls('package:creditR')

輸出：

4. Credit R 軟件包的應(yīng)用

我們已經(jīng)說(shuō)了很多理論知識(shí)了，現(xiàn)在開(kāi)始深入了解R軟件包吧！

下面列舉了一個(gè)Credit R的應(yīng)用實(shí)例，它研究了如何使用軟件包中提供的函數(shù)執(zhí)行信貸風(fēng)險(xiǎn)評(píng)分中的一些常見(jiàn)步驟。

在完成本例時(shí)，我們做了一些實(shí)踐。

一般應(yīng)用程序在兩個(gè)主要標(biāo)題下構(gòu)建，即建模和模型驗(yàn)證，在注釋行中可以看到相應(yīng)代碼的詳細(xì)信息。

本文僅列舉了重要的輸出。

此R腳本旨在使Credit R軟件包更容易理解。獲得高精度模型不在本研究范圍內(nèi)。

# Attaching the library

library(creditR)

#Model data and data structure

data('germancredit')

str(germancredit)

#Preparing a sample data set

sample_data <- germancredit[,c('duration.in.month','credit.amount','installment.rate.in.percentage.of.disposable.income', 'age.in.years','creditability')]

#Converting the ‘Creditability’ (default flag) variable into numeric type

sample_data$creditability <- ifelse(sample_data$creditability == 'bad',1,0)

#Calculating the missing ratios

missing_ratio(sample_data)

輸出：

#Splitting the data into train and test sets

traintest <- train_test_split(sample_data,123,0.70)

train <- traintest$train

test <- traintest$test

WOE變換是一種通過(guò)變量與目標(biāo)變量的關(guān)系將變量轉(zhuǎn)換為分類(lèi)變量的方法。以下“woerules”對(duì)象包含WOE規(guī)則。

在woe.binning.deploy函數(shù)的幫助下，WOE規(guī)則能夠在數(shù)據(jù)集上運(yùn)行。我們需要的變量在“woe.get.clear.data”函數(shù)的幫助下分配給“train_woe”對(duì)象。

#Applying WOE transformation on the variables

woerules <- woe.binning(df = train,target.var = 'creditability',pred.var = train,event.class = 1)

train_woe <- woe.binning.deploy(train, woerules, add.woe.or.dum.var='woe')

#Creating a dataset with the transformed variables and default flag

train_woe <- woe.get.clear.data(train_woe,default_flag = 'creditability',prefix = 'woe')

#Applying the WOE rules used on the train data to the test data

test_woe <- woe.binning.deploy(test, woerules, add.woe.or.dum.var='woe')

test_woe <- woe.get.clear.data(test_woe,default_flag = 'creditability',prefix = 'woe')

信息值和單變量基尼系數(shù)可以用作選擇變量的方法。一般來(lái)說(shuō)，IV的閾值為0.30，單變量基尼的閾值為0.10。

#Performing the IV and Gini calculations for the whole data set

IV.calc.data(train_woe,'creditability')

輸出：

Gini.univariate.data(train_woe,'creditability')

輸出：

#Creating a new dataset by Gini elimination. IV elimination is also possible

eliminated_data <- Gini_elimination(train_woe,'creditability',0.10)

str(eliminated_data)

輸出：

現(xiàn)實(shí)生活中有太多的變量無(wú)法用相關(guān)矩陣來(lái)管理。因此，可以將它們進(jìn)行聚類(lèi)以確定具有相似特征的變量。這里由于變量的數(shù)量很少，這種特殊的聚類(lèi)示例沒(méi)有意義，但該方法通常在具有大量變量的數(shù)據(jù)集中非常有用。

#A demonstration of the functions useful in performing Clustering

clustering_data <- variable.clustering(eliminated_data,'creditability', 2)

clustering_data

輸出：

# Returns the data for variables that have the maximum gini value in the dataset

selected_data <- variable.clustering.gini(eliminated_data,'creditability', 2)

在某些情況下，集群的平均相關(guān)性很重要，因?yàn)榧旱臄?shù)量設(shè)置可能不正確。因此，如果集群具有較高的平均相關(guān)性，則應(yīng)該對(duì)其進(jìn)行詳細(xì)檢查。相關(guān)性值（在集群1中只有一個(gè)變量）為NaN。

correlation.cluster(eliminated_data,clustering_data,variables = 'variable',clusters = 'Group')

輸出：

通過(guò)包含在數(shù)據(jù)集中的變量形成模型。當(dāng)模型摘要檢查變量時(shí)，這些變量似乎是有意義的。然后，借助“woe.glm.feature.importance”函數(shù)，計(jì)算出變量的權(quán)重。實(shí)際上，權(quán)重是根據(jù)單個(gè)單位變化對(duì)概率的影響來(lái)計(jì)算的。

#Creating a logistic regression model of the data

model= glm(formula = creditability ~ ., family = binomial(link = 'logit'), data = eliminated_data)

summary(model)

輸出：

#Calculating variable weights

woe.glm.feature.importance(eliminated_data,model,'creditability')

輸出：

#Generating the PD values for the train and test data

ms_train_data <- cbind(eliminated_data,model$fitted.values)

ms_test_data <- cbind(test_woe[,colnames(eliminated_data)], predict(model,type = 'response', newdata = test_woe))

colnames(ms_train_data) <- c('woe.duration.in.month.binned','woe.age.in.years.binned','woe.installment.rate.in.percentage.of.disposable.income.binned','creditability','PD')

colnames(ms_test_data) <- c('woe.duration.in.month.binned','woe.age.in.years.binned','woe.installment.rate.in.percentage.of.disposable.income.binned','creditability','PD')

在現(xiàn)實(shí)生活中，相關(guān)機(jī)構(gòu)使用的是評(píng)定量表而不是連續(xù)的PD值。由于一些監(jiān)管問(wèn)題或?yàn)榱诉m應(yīng)不斷變化的市場(chǎng)/投資組合條件，模型會(huì)根據(jù)不同的中心趨勢(shì)進(jìn)行校準(zhǔn)。

回歸和貝葉斯校正方法包含在軟件包中。通過(guò)嵌入企業(yè)系統(tǒng)，并借助代碼“calibration object$calibration_formula”，可以獲得可執(zhí)行校準(zhǔn)的數(shù)值函數(shù)作為輸出。

#An example application of the Regression calibration method. The model is calibrated to the test_woe data

regression_calibration <- regression.calibration(model,test_woe,'creditability')

regression_calibration$calibration_data

regression_calibration$calibration_model

regression_calibration$calibration_formula

輸出：

在評(píng)定量表上采用貝葉斯校正方法。借助“master.scale”函數(shù)，我們可以輕松創(chuàng)建評(píng)定量表。然而，在現(xiàn)實(shí)生活中，只有在多次嘗試之后才能創(chuàng)建評(píng)定量表。

將摘要添加到輸出中。運(yùn)行R腳本可以看到詳細(xì)信息。此外，本示例的目的僅僅是在本研究范圍內(nèi)引入函數(shù)，因此PD值不會(huì)單獨(dú)增加。

#Creating a master scale

master_scale <- master.scale(ms_train_data,'creditability','PD')

master_scale

輸出：

為了采用貝葉斯校正法，我們?cè)跀?shù)據(jù)集中創(chuàng)建了分?jǐn)?shù)變量，然后將評(píng)定量表的集中趨勢(shì)校準(zhǔn)到5%。

#Calibrating the master scale and the modeling data to the default rate of 5% using the bayesian calibration method

ms_train_data$Score = log(ms_train_data$PD/(1-ms_train_data$PD))

ms_test_data$Score = log(ms_test_data$PD/(1-ms_test_data$PD))

bayesian_method <- bayesian.calibration(data = master_scale,average_score ='Score',total_observations = 'Total.Observations',PD = 'PD',central_tendency = 0.05,calibration_data = ms_train_data,calibration_data_score ='Score')

#After calibration, the information and data related to the calibration process can be obtained as follows

bayesian_method$Calibration.model

bayesian_method$Calibration.formula

輸出：

在實(shí)際應(yīng)用中，對(duì)于不熟悉風(fēng)險(xiǎn)管理的員工來(lái)說(shuō)，理解可能性的概念很難。因此，需要?jiǎng)?chuàng)建標(biāo)準(zhǔn)分。標(biāo)準(zhǔn)分可以通過(guò)使用“scalled.score”函數(shù)來(lái)創(chuàng)建。

#The Scaled score can be created using the following function

scaled.score(bayesian_method$calibration_data, 'calibrated_pd', 3000, 15)

在建模階段之后，執(zhí)行模型驗(yàn)證以驗(yàn)證不同的期望，例如模型的準(zhǔn)確性和穩(wěn)定性。在現(xiàn)實(shí)生活中還會(huì)應(yīng)用定性驗(yàn)證過(guò)程。

注意：模型校準(zhǔn)僅用于例證。模型驗(yàn)證測(cè)試按照以下原始標(biāo)準(zhǔn)進(jìn)行。

在邏輯回歸創(chuàng)建的模型中，我們應(yīng)考慮多重共線(xiàn)性問(wèn)題。盡管使用了不同的閾值，但大于5的vif值表示存在這個(gè)問(wèn)題。

#Calculating the Vif values of the variables.

vif.calc(model)

輸出：

一般來(lái)說(shuō)，基尼系數(shù)的可接受下限為0.40。但是，這可能會(huì)因模型類(lèi)型而異。

#Calculating the Gini for the model

Gini(model$fitted.values,ms_train_data$creditability)

輸出：

0.3577422

#Performing the 5 Fold cross validation

k.fold.cross.validation.glm(ms_train_data,'creditability',5,1)

輸出：

#The KS test is performed on the distributions of the estimates for good and bad observations

Kolmogorov.Smirnov(ms_train_data,'creditability','PD')

Kolmogorov.Smirnov(ms_test_data,'creditability','PD')

記分卡通常要很長(zhǎng)時(shí)間后才會(huì)進(jìn)行修訂，因?yàn)樾抻嗊^(guò)程會(huì)產(chǎn)生巨大的運(yùn)營(yíng)成本。因此，模型的穩(wěn)定性降低了修改的必要性。此外，相關(guān)機(jī)構(gòu)需要穩(wěn)定的模型，因?yàn)檫@些模型被用作許多計(jì)算的輸入，如減值、資本、風(fēng)險(xiǎn)加權(quán)資產(chǎn)等。

系統(tǒng)穩(wěn)定性指數(shù)（System Stability Index）是用來(lái)衡量模型和變量穩(wěn)定性的一種測(cè)試。若ssi值大于0.25，則表明變量穩(wěn)定性受到損害。

#Variable stabilities are measured

SSI.calc.data(train_woe,test_woe,'creditability')

輸出：

由于主量表的主要目的是區(qū)分風(fēng)險(xiǎn)，所以HHI測(cè)試用于測(cè)量主量表的濃度。HHI值超過(guò)0.30表示濃度高。這可能是因?yàn)樘幱诮ｋA段或主量表的創(chuàng)建方式不正確。

#The HHI test is performed to measure the concentration of the master scale

Herfindahl.Hirschman.Index(master_scale,'Total.Observations')

輸出：

0.1463665

通過(guò)“anchor.point”函數(shù)，我們測(cè)試了默認(rèn)速率是否與預(yù)期水平的平均PD兼容。

#Performing the Anchor point test

Anchor.point(master_scale,'PD','Total.Observations',0.30)

輸出：

卡方檢驗(yàn)也可以用作校準(zhǔn)檢驗(yàn)?！癱hi square.test”函數(shù)可用于在指定的置信度級(jí)別下執(zhí)行此測(cè)試。

#The Chi-square test is applied on the master scale

chisquare.test(master_scale,'PD','Bad.Count','Total.Observations',0.90)

輸出：

二項(xiàng)式測(cè)試也可以用作校準(zhǔn)測(cè)試。單尾二項(xiàng)檢驗(yàn)通常用于IRB模型，而雙尾二項(xiàng)檢驗(yàn)用于IFRS 9模型。但除了IRB之外，雙尾二項(xiàng)檢驗(yàn)對(duì)于一般用途更方便。

#The Binomial test is applied on the master scale

master_scale$DR <- master_scale$Bad.Count/master_scale$Total.Observations

Binomial.test(master_scale,'Total.Observations','PD','DR',0.90,'one')

輸出：

為了確保連續(xù)性，需要對(duì)建模和模型驗(yàn)證進(jìn)行管理。當(dāng)正確管理R環(huán)境時(shí)，機(jī)構(gòu)可以輕松提供這種可管理的建模和驗(yàn)證環(huán)境。

相關(guān)機(jī)構(gòu)正在使用開(kāi)放源代碼環(huán)境（如R或具有大數(shù)據(jù)技術(shù)的Python）設(shè)計(jì)更高效的業(yè)務(wù)流程。從這個(gè)方面來(lái)看，Credit R為建模和驗(yàn)證方法的應(yīng)用帶來(lái)了組織上的便利。

結(jié)束語(yǔ)

Credit R軟件包為用戶(hù)提供了許多執(zhí)行傳統(tǒng)信用風(fēng)險(xiǎn)評(píng)分的方法，以及一些用于測(cè)試模型有效性的方法，這些方法也可應(yīng)用于ML算法。此外，由于該軟件包在傳統(tǒng)方法的應(yīng)用中提供了自動(dòng)化功能，因此可以降低傳統(tǒng)過(guò)程的操作成本。

此外，可以將這些模型與機(jī)器學(xué)習(xí)模型進(jìn)行比較，以證明ML模型也符合法規(guī)要求，滿(mǎn)足這些要求是應(yīng)用ML模型的前提。

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶(hù)發(fā)布，不代表本站觀(guān)點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買(mǎi)等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

小男孩‘自慰网亚洲一区二区,亚洲一级在线播放毛片,亚洲中文字幕av每天更新,黄aⅴ永久免费无码,91成人午夜在线精品,色网站免费在线观看,亚洲欧洲wwwww在线观看

代碼詳解：準(zhǔn)確率驚人！用Credit R創(chuàng)建信用風(fēng)險(xiǎn)評(píng)分模型