求助！！非参数模型问题代码或思路

1730

收藏 2017-02-03

先给大家拜个晚年！真心求教下面这个题的C和D问代码如何编写

数据库有5个变量：high.GPA, math.SAT, verb.SAT,univ.GPA
问题是：
(a) Plot histograms and KDEs for the four unique variables (high.GPA, math.SAT, verb.SAT,
univ.GPA) using a Gaussian kernel with the default bandwidth selection rule (bw="nrd0").
(b) Plot a histogram of the GPA difference (university minus high school) and overlay the KDE.
For the KDE use a Gaussian kernel with the default bandwidth selection rule (bw="nrd0").
(c) Use the ECDF to estimate P(U > 3,H < 3) where H and U are a student’s high school and
university GPA, respectively.
(d) Use the ECDF to estimate P(U > 3.5,H < 3.5) where H and U are a student’s high school
and university GPA, respectively.

前两个都容易出
attach(sat)
#a
  par(mfrow=c(1,2))
  hist(high.GPA,freq = F,xlim=c(2,4),ylim=c(0,1))
  kde = density(high.GPA,bw="nrd0",kernel = "gaussian")
  plot(kde,xlim=c(1,5),ylim=c(0,1))
  lines(kde,xlim = c(1,5),col="red")
#b
  x=univ.GPA-high.GPA
  hist(x,xlim=c(-1,1),freq = F)
  kde = density(x,bw="nrd0",kernel = "gaussian")
  lines(kde,xlim=c(-2,2),col="red")

但是后两问要求用ecdf去估计概率，本以为有类似以mecdf这种函数却发现没有，发现毫无下手之处，恳请各位大佬指点一下。
最后上个数据截图