请教不平衡分类样本,DMwR包中的SMOTE函数中perc.over和perc.under的意义。
怎样设置为好,与产生的新数据集的大小类样本比例关系是怎样的?
SMOTE(form, data, perc.over = 200, k = 5, perc.under = 200,
learner = NULL, ...)
Arguments
form
A formula describing the prediction problem
data
A data frame containing the original (unbalanced) data set
perc.over
A number that drives the decision of how many extra cases from the minority class are generated (known as over-sampling).
k
A number indicating the number of nearest neighbours that are used to generate the new examples of the minority class.
perc.under
A number that drives the decision of how many extra cases from the majority classes are selected for each case generated from the minority class (known as under-sampling)