摘要翻译:
二值分类在信用评分中被广泛应用于违约概率的估计。这种预测模型的验证既基于秩能力,也基于校准(即模型输出的概率与观察到的概率映射的准确性)。在这项研究中,我们涵盖了当前关于二元分类校准的最佳实践,并探讨了不同的方法如何在真实世界的信用评分数据上产生不同的结果。探讨了仅使用等级能力度量来评估信用评分模型的局限性。在18个真实世界的数据集上运行了一个基准测试,并对结果进行了比较。所使用的校正技术是Platt标度和等张回归。此外,不同的
机器学习模型被使用:逻辑回归,随机森林分类器,梯度提升分类器。结果表明,当数据集作为时间序列处理时,采用等张回归的再定标方法比其他方法更能改善长期定标效果。通过重新校准,非参数模型能够在Brier评分损失方面优于Logistic回归。
---
英文标题:
《Calibration of Machine Learning Classifiers for Probability of Default
Modelling》
---
作者:
Pedro G. Fonseca and Hugo D. Lopes
---
最新提交年份:
2017
---
分类信息:
一级分类:Economics 经济学
二级分类:Econometrics 计量经济学
分类描述:Econometric Theory, Micro-Econometrics, Macro-Econometrics, Empirical Content of Economic Relations discovered via New Methods, Methodological Aspects of the Application of Statistical Inference to Economic Data.
计量经济学理论,微观计量经济学,宏观计量经济学,通过新方法发现的经济关系的实证内容,统计推论应用于经济数据的方法论方面。
--
一级分类:Statistics 统计学
二级分类:Machine Learning 机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--
---
英文摘要:
Binary classification is highly used in credit scoring in the estimation of probability of default. The validation of such predictive models is based both on rank ability, and also on calibration (i.e. how accurately the probabilities output by the model map to the observed probabilities). In this study we cover the current best practices regarding calibration for binary classification, and explore how different approaches yield different results on real world credit scoring data. The limitations of evaluating credit scoring models using only rank ability metrics are explored. A benchmark is run on 18 real world datasets, and results compared. The calibration techniques used are Platt Scaling and Isotonic Regression. Also, different machine learning models are used: Logistic Regression, Random Forest Classifiers, and Gradient Boosting Classifiers. Results show that when the dataset is treated as a time series, the use of re-calibration with Isotonic Regression is able to improve the long term calibration better than the alternative methods. Using re-calibration, the non-parametric models are able to outperform the Logistic Regression on Brier Score Loss.
---
PDF链接:
https://arxiv.org/pdf/1710.08901