摘要翻译:
由于监管和可解释性的原因,逻辑回归仍然被广泛使用。为了提高预测精度和可解释性,通常需要对连续数据和分类数据进行量化处理:连续特征离散化,如果分类特征数量多,则对分类特征进行分组。通过将该量化估计步骤直接嵌入到预测估计步骤本身中,可以达到甚至更好的预测精度。但这样做,预测损失必须在一个巨大的集合上进行优化。为了克服这一困难,我们引入了具体的两步优化策略:首先,通过光滑函数逼近不连续量化函数来松弛优化问题;其次,通过一个特定的
神经网络求解得到的松弛优化问题。这种方法的良好性能,我们称之为glmdisc,是在UCI图书馆和CR\'Edit Agricole消费者金融(一个主要的欧洲历史上的消费者信贷市场参与者)的模拟和真实数据上说明的。
---
英文标题:
《Feature quantization for parsimonious and interpretable predictive
models》
---
作者:
Adrien Ehrhardt, Christophe Biernacki, Vincent Vandewalle, Philippe
Heinrich
---
最新提交年份:
2019
---
分类信息:
一级分类:Statistics 统计学
二级分类:Methodology 方法论
分类描述:Design, Surveys, Model Selection, Multiple Testing, Multivariate Methods, Signal and Image Processing, Time Series, Smoothing, Spatial Statistics, Survival Analysis, Nonparametric and Semiparametric Methods
设计,调查,模型选择,多重检验,多元方法,信号和图像处理,时间序列,平滑,空间统计,生存分析,非参数和半参数方法
--
一级分类:Economics 经济学
二级分类:Econometrics 计量经济学
分类描述:Econometric Theory, Micro-Econometrics, Macro-Econometrics, Empirical Content of Economic Relations discovered via New Methods, Methodological Aspects of the Application of Statistical Inference to Economic Data.
计量经济学理论,微观计量经济学,宏观计量经济学,通过新方法发现的经济关系的实证内容,统计推论应用于经济数据的方法论方面。
--
---
英文摘要:
For regulatory and interpretability reasons, logistic regression is still widely used. To improve prediction accuracy and interpretability, a preprocessing step quantizing both continuous and categorical data is usually performed: continuous features are discretized and, if numerous, levels of categorical features are grouped. An even better predictive accuracy can be reached by embedding this quantization estimation step directly into the predictive estimation step itself. But doing so, the predictive loss has to be optimized on a huge set. To overcome this difficulty, we introduce a specific two-step optimization strategy: first, the optimization problem is relaxed by approximating discontinuous quantization functions by smooth functions; second, the resulting relaxed optimization problem is solved via a particular neural network. The good performances of this approach, which we call glmdisc, are illustrated on simulated and real data from the UCI library and Cr\'edit Agricole Consumer Finance (a major European historic player in the consumer credit market).
---
PDF链接:
https://arxiv.org/pdf/1903.08920