装袋就是剪枝 - 外文文献专区

278

收藏 2022-03-26

摘要翻译：
建立一个糟糕的随机森林(RF)是出了名的困难。同时，RF在没有任何明显后果的情况下，明显地覆盖了采样内。标准的论点，如经典的偏差-方差权衡或双重下降，不能使这一悖论合理化。我提出了一个新的解释：引导聚合和模型扰动由RF实现，自动修剪一个潜在的“真”树。更一般地说，贪婪优化的学习者的随机集合隐含地执行最优的早期停止样本外。所以不需要调整停止点。通过构建，Boosting和MARS的新变体也符合自动调谐的条件。我用模拟的和真实的数据经验地证明了这个特性，报告了这些新的完全过拟合的集合与它们调优的对应体的性能相似--或者更好。
---
英文标题：
《To Bag is to Prune》
---
作者：
Philippe Goulet Coulombe
---
最新提交年份：
2021
---
分类信息：

一级分类：Statistics 统计学
二级分类：Machine Learning 机器学习
分类描述：Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文（监督，无监督，半监督学习，图形模型，强化学习，强盗，高维推理等）与统计或理论基础
--
一级分类：Computer Science 计算机科学
二级分类：Machine Learning 机器学习
分类描述：Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文（有监督的，无监督的，强化学习，强盗问题，等等），包括健壮性，解释性，公平性和方法论。对于机器学习方法的应用，CS.LG也是一个合适的主要类别。
--
一级分类：Economics 经济学
二级分类：Econometrics 计量经济学
分类描述：Econometric Theory, Micro-Econometrics, Macro-Econometrics, Empirical Content of Economic Relations discovered via New Methods, Methodological Aspects of the Application of Statistical Inference to Economic Data.
计量经济学理论，微观计量经济学，宏观计量经济学，通过新方法发现的经济关系的实证内容，统计推论应用于经济数据的方法论方面。
--

---
英文摘要：
It is notoriously difficult to build a bad Random Forest (RF). Concurrently, RF blatantly overfits in-sample without any apparent consequence out-of-sample. Standard arguments, like the classic bias-variance trade-off or double descent, cannot rationalize this paradox. I propose a new explanation: bootstrap aggregation and model perturbation as implemented by RF automatically prune a latent "true" tree. More generally, randomized ensembles of greedily optimized learners implicitly perform optimal early stopping out-of-sample. So there is no need to tune the stopping point. By construction, novel variants of Boosting and MARS are also eligible for automatic tuning. I empirically demonstrate the property, with simulated and real data, by reporting that these new completely overfitting ensembles perform similarly to their tuned counterparts -- or better.
---
PDF链接：
https://arxiv.org/pdf/2008.07063

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群