请教不平衡分类样本，DMwR包中的SMOTE函数中perc.over和perc.under的意义

6822

收藏 2015-12-24

请教不平衡分类样本，DMwR包中的SMOTE函数中perc.over和perc.under的意义。
怎样设置为好，与产生的新数据集的大小类样本比例关系是怎样的？

SMOTE(form, data, perc.over = 200, k = 5, perc.under = 200,
learner = NULL, ...)
Arguments

form
A formula describing the prediction problem
data
A data frame containing the original (unbalanced) data set
perc.over
A number that drives the decision of how many extra cases from the minority class are generated (known as over-sampling).
k
A number indicating the number of nearest neighbours that are used to generate the new examples of the minority class.
perc.under
A number that drives the decision of how many extra cases from the majority classes are selected for each case generated from the minority class (known as under-sampling)

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

吴追求

2017-5-4 20:55:56

perc.over：定义过采样的抽样次数，即对于少数类样本点，需要为每个点重新构造多少个点。默认值为200，即重新为每个少数类样本点构造200/100=2个点。
perc.under：定义欠采样的抽样次数，即从多数类样本中选择perc.under倍于新生成的样本数量，默认为200，即从多数类样本中选择200/100=2倍于新生成样本的数量

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

彼岸没有灯塔

2017-5-5 09:30:21

吴追求发表于 2017-5-4 20:55
perc.over：定义过采样的抽样次数，即对于少数类样本点，需要为每个点重新构造多少个点。默认值为200，即重 ...

非常感谢！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群