先给大家拜个晚年!真心求教下面这个题的C和D问代码如何编写
数据库有5个变量:high.GPA, math.SAT, verb.SAT,univ.GPA
问题是:
(a) Plot histograms and KDEs for the four unique variables (high.GPA, math.SAT, verb.SAT,
univ.GPA) using a Gaussian kernel with the default bandwidth selection rule (bw="nrd0").
(b) Plot a histogram of the GPA difference (university minus high school) and overlay the KDE.
For the KDE use a Gaussian kernel with the default bandwidth selection rule (bw="nrd0").
(c) Use the ECDF to estimate P(U > 3,H < 3) where H and U are a student’s high school and
university GPA, respectively.
(d) Use the ECDF to estimate P(U > 3.5,H < 3.5) where H and U are a student’s high school
and university GPA, respectively.
前两个都容易出
attach(sat)
#a
par(mfrow=c(1,2))
hist(high.GPA,freq = F,xlim=c(2,4),ylim=c(0,1))
kde = density(high.GPA,bw="nrd0",kernel = "gaussian")
plot(kde,xlim=c(1,5),ylim=c(0,1))
lines(kde,xlim = c(1,5),col="red")
#b
x=univ.GPA-high.GPA
hist(x,xlim=c(-1,1),freq = F)
kde = density(x,bw="nrd0",kernel = "gaussian")
lines(kde,xlim=c(-2,2),col="red")
但是后两问要求用ecdf去估计概率,本以为有类似以mecdf这种函数却发现没有,发现毫无下手之处,恳请各位大佬指点一下。
最后上个数据截图