全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 R语言论坛
2016 3
2012-08-03
悬赏 10 个论坛币 已解决
以构建tree为例,boosting,bagging这些方法的作用之一都是将大数据分块处理得到较小训练数据集来得到tree,然后对这些tree进行投票得到最终结果。对bagging而言,训练数据的选取是随机的,而我现在的疑问是boosting如何在原始数据(大数据)上面选取训练样本,感觉是和分布有关,但是具体的不太清楚,求帮忙解答,谢谢 ☺
不能是对大数据直接训练吧,感觉那样就没意义了

最佳答案

ltx5151 查看完整内容

Hi, Firstly, please note that Adaboost is not the same concept as boosting. Boosting it a more general idea of machine learning model. I don't think Adaboost is to use partial of the training sample. It uses all of the sample to build weak learners and take all the weak learners together to make better classification. I guess want you really refer to is stochastic boosting, rather than A ...
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2012-8-3 00:39:38
Hi,

Firstly, please note that Adaboost is not the same concept as boosting. Boosting it a more general idea of machine learning model.

I don't think Adaboost is to use partial of the training sample. It uses all of the sample to build weak learners and take all the weak learners together to make better classification.

I guess want you really refer to is stochastic boosting, rather than Adaboost. stochastic boosting would have a such a random sub-sampling step to train each base learner. This kind of takes advantage of the randomness to avoid overfitting and accelerate the training process.  For details, you can see Friedman 1999.
Bagging and boosting are ensemble learning methods that becomes popular in recent 15 years. But typically, boosting can be more powerful than bagging. Bagging was origianlly designed to achieve variance reduction. You can view that as a specific bootstrap method. So such persepctive would be closer to what you really worry about.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2012-8-3 13:58:09
ltx5151 发表于 2012-8-3 00:39
Hi,

Firstly, please note that Adaboost is not the same concept as boosting. Boosting it a more ge ...
谢谢,看了回复后很有帮助,最近在做森林扰动的遥感图像分类,主要用的二叉树的方法,但是由于数据量过大,行列想乘后有几百万个样本,所以对这些样本应用Adaboost不怎么现实,需要bagging这类投票的方法,但是由于bagging的精度限制,想找个更好的方法,谢谢你的建议,想问下R里面有没有stochastic boosting的包啊?谢谢了
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2012-8-6 08:21:59
dabap 发表于 2012-8-3 13:58
谢谢,看了回复后很有帮助,最近在做森林扰动的遥感图像分类,主要用的二叉树的方法,但是由于数据量过大 ...
CRAN上面有的,是Friedman的一个学生做的,gbm
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群