text(bank.tree, use.n = T, col = "red", cex = 0.6)
复制代码
5. 变量初选,分析和变换
根据决策树分析的结果,我们选择变量重要性最高的前5个变量做进一步研究,依次是:
Duration : last contact duration, in seconds (numeric)
month : last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec")
poutcome : outcome of the previous marketing campaign (categorical: "unknown","other","failure","success")
pdays : number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted)
previous : number of contacts performed before this campaign and for this client (numeric)
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
复制代码 根据拟合线的形态,需要对duration做一个二次项。
bank$duration.sq <- bank$duration * bank$duration
复制代码
b) month
summary(bank$month)
## apr aug dec feb jan jul jun mar may nov oct sep