我这有种方法,需要用到两个package,先提出缺失率比较大的观测,然后对剩余的缺失值进行填补,具体程序如下:
library(foreign)
library(DMwR)
read.csv("hepatitis.csv")->A
head(A)
A=A[-manyNAs(A,0.2),] #delete the observation that has NAs more then 20%
clean.A=knnImputation(A,k=10) #imputate the left NAs use the similarities between the rows