求教各位大侠了,这个分类树要把我折磨死了,怎么做都做不出来。
首先我先summary一下我的dataset
> summary(DataCom1)
BMI_30 C_CTE Current_smoking Diabetes EVER_CAD FIBR
0:1488 0:1612 0:1250 0:1525 Min. :0.000 Min. :1.700
1: 354 1: 230 1: 592 1: 317 1st Qu.:1.000 1st Qu.:3.200
Median :2.000 Median :3.600
Mean :1.915 Mean :3.657
3rd Qu.:3.000 3rd Qu.:4.000
Max. :3.000 Max. :8.400
KARB LPK MTHFD105average NAA_CAD NeoptaverageD0 pCreataverageD0
Min. : 1.900 Min. : 2.900 0:615 0:197 Min. : 3.274 Min. : 38.30
1st Qu.: 5.000 1st Qu.: 5.800 1:883 1:524 1st Qu.: 6.458 1st Qu.: 64.80
Median : 6.000 Median : 6.900 2:344 2:494 Median : 7.772 Median : 73.80
Mean : 6.223 Mean : 7.164 3:627 Mean : 8.639 Mean : 75.44
3rd Qu.: 7.100 3rd Qu.: 8.200 3rd Qu.: 9.732 3rd Qu.: 83.70
Max. :27.400 Max. :18.000 Max. :44.340 Max. :265.00
RI_HK RI_HT spFolateaverageD0 SY_ANDRE Gender PE_ALDER
0: 662 0:952 Min. : 2.420 0:1641 1:1455 Min. :28.0
1:1073 1:884 1st Qu.: 7.356 2: 93 2: 387 1st Qu.:56.0
9: 90 Mean :12.359 : 18 Mean :62.2
3rd Qu.:14.540 7: 17 3rd Qu.:70.0
Max. :91.520 3: 16 Max. :87.0
(Other) : 15
我的code 如下:
NameTable1<-c("Gender","PE_ALDER","MTHFD105average","BMI_30","RI_HT",
"Diabetes","Current_smoking","NAA_CAD","SY_ANDRE",
"EVER_CAD","RI_HK","FIBR","spFolateaverageD0","LPK","pCreataverageD0",
"NeoptaverageD0","KARB","C_CTE")
NewData1<-wenbit[,is.element(names(wenbit),NameTable1)]
DataCom1<-na.omit(NewData1)
trait<-DataCom1$C_CTE
round(apply(is.na(DataCom1),2,sum)/dim(DataCom1)[1],3)
require(rpart)
ClassTree<-rpart(trait~gender+age+geno+bmi+hypertension+diabetes+cur_rok+nocad+ander+
evercad+hk+fibr+folate+lpk+creat+neop+karb,DataCom1,control=rpart.control(minsplit=1))
ClassTree
plot(ClassTree)
text(ClassTree)
悲剧的是我只得到了这么个结果:
n= 1842
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 1842 230 0 (0.8751357 0.1248643) *
> plot(ClassTree)
Error in plot.rpart(ClassTree) : fit is not a tree, just a root
想问下为什么我的code里有什么问题,或者数据结构有什么问题,为什么做不出来更多的split呢?这个问题真心困扰我一个月了啊……跪谢!