# ShichenXie/scorecard

* 安装部分请参考项目描述，比较简单，这里就不赘述了。
* 在进行以下步骤前，强烈建议使用 RStudio （放心，这也是免费的）。

The goal of scorecard package is to make the development of the traditional credit risk scorecard model easier and efficient by providing functions for some common tasks that summarized in below. This package can also used in the development of machine learning models on binomial classification.

• 数据准备 - data preparation (split_df, one_hot)
• 变量筛选 - variable selection (var_filter, iv, vif)
• 分箱以及WOE转换 - weight of evidence (woe) binning (woebin, woebin_plot, woebin_adj, woebin_ply)
• 表现评估 - performance evaluation (perf_eva, perf_psi)
• 评分转换 - scorecard scaling (scorecard, scorecard_ply)
• 评分卡报告 - scorecard report (gains_table, report)

• 引入scorecard程序包
``````# Traditional Credit Scoring Using Logistic Regression
library(scorecard)
``````
• 数据准备
注意目标变量（Y）要设置成0|1，1表示坏样本。
以下例子的程序包中有兼容，即'bad'为坏样本
``````# data preparing ------
# 加载German Credit数据
data("germancredit")
# filter variable via missing rate, iv, identical value rate
# 默认的是筛选IV值大于0.02，缺失率小于95%，identical value rate 小于0.95
dt_f = var_filter(germancredit, y="creditability")
# breaking dt into train and test
# 把数据分成训练（60%）、测试（40%），随机数种子随便设，这里是30
dt_list = split_df(dt_f, y="creditability", ratio = 0.6, seed = 30)
label_list = lapply(dt_list, function(x) x\$creditability)
``````
``````# woe binning ------
# 分箱
bins = woebin(dt_f, y="creditability")
# 分箱情况图表
woebin_plot(bins)
``````

``````# binning adjustment
## adjust breaks interactively
## 手动调整分箱
# breaks_adj = woebin_adj(dt_f, "creditability", bins)
## or specify breaks manually
breaks_adj = list(
age.in.years=c(26, 35, 40),
other.debtors.or.guarantors=c("none", "co-applicant%,%guarantor"))
bins_adj = woebin(dt_f, y="creditability", breaks_list=breaks_adj)
``````

• 把原始变量转换成woe值
``````# converting train and test into woe values
dt_woe_list = lapply(dt_list, function(x) woebin_ply(x, bins_adj))
``````
• 用训练数据开始建模
``````# glm ------
m1 = glm( creditability ~ ., family = binomial(), data = dt_woe_list\$train)
# vif(m1, merge_coef = TRUE) # summary(m1)
# Select a formula-based model by AIC (or by LASSO for large dataset)
m_step = step(m1, direction="both", trace = FALSE)
m2 = eval(m_step\$call)
# 以下语句是检验膨胀系数
vif(m2, merge_coef = TRUE) # summary(m2)
``````
• 检查模型效果
``````# performance ks & roc ------
## predicted proability
pred_list = lapply(dt_woe_list, function(x) predict(m2, x, type='response'))
## performance 以下语句把模型所有都效果曲线都显示出来
perf = perf_eva(pred = pred_list, label = label_list, show_plot = c('ks', 'lift', 'gain', 'roc', 'lz', 'pr', 'f1', 'density'))
``````

• 什么是术语？