花了若干个小时做了一次作业,但水平实在太有限了,只能在这请教下各位大大了,只要懂一些R的应该解决起来都很简单。需要的数据在此:
第一个问题:
写一个名为“pollutantmean'的函数,计算整个指定列表中(specdata)的污染物(
(sulfate 或nitrate )的平均值的函数。函数'pollutantmean'有三个参数:'目录','污染'和'ID'。无视编码为NA任何遗漏值。函数原型如下:
pollutantmean <- function(directory, pollutant, id = 1:332) {
## 'directory' is a character vector of length 1 indicating
## the location of the CSV files
## 'pollutant' is a character vector of length 1 indicating
## the name of the pollutant for which we will calculate the
## mean; either "sulfate" or "nitrate".
## 'id' is an integer vector indicating the monitor ID numbers
## to be used
## Return the mean of the pollutant across all monitors list
## in the 'id' vector (ignoring NA values)
}
参考答案:
pollutantmean("specdata", "nitrate", 70:72)
## [1] 1.706
pollutantmean("specdata", "nitrate", 23)
## [1] 1.281
我写的如下:
pollutantmean <- function(directory,pollutant,id=1:332){
files_list <- dir(directory, full.names=T)
data <- data.frame()
for (i in 1:332){
data <- rbind(data,read.csv(files_list
))
}
data_subset <- subset(data, data$ID<=max(id)&data$ID<=max(id)&data$ID>=min(id))
if(pollutant=="sulfate"){
result<-mean(data_subset$sulfate, na.rm=T)
}
if(pollutant=="nitrate"){
result<-mean(data_subset$nitrate, na.rm=T)
}
return (result)
}
我计算的结果都是对的,但是:
1.似乎结果位数过多,且运算可能过久导致Coursera系统自动否定了我的答案,希望得到详细的修改意见~
第二个问题:
写一个函数,这个函数的原型如下:
complete <- function(directory, id = 1:332) {
## 'directory' is a character vector of length 1 indicating
## the location of the CSV files
## 'id' is an integer vector indicating the monitor ID numbers
## to be used
## Return a data frame of the form:
## id nobs
## 1 117
## 2 1041
## ...
## where 'id' is the monitor ID number and 'nobs' is the
## number of complete cases
}
答案示例:
complete("specdata", 30:25)
## id nobs
## 1 30 932
## 2 29 711
## 3 28 475
## 4 27 338
## 5 26 586
## 6 25 463
我写的:
complete<- function(directory,id=1:332){
files_list <- dir(directory, full.names=T)
data <- data.frame()
for (i in 1:332){
data <- rbind(data,read.csv(files_list))
}
filecom<-vector()
for (i in id){
data_subset<-subset(data,data$ID==i)
data2<-data_subset[,2:3]
cc<-sum(complete.cases(data2))
filecom<-rbind(filecom,c(i,cc))
}
colnames(filecom)<-c("id","nobs")
return (filecom) }
1.结果都是对的,但是:
> class(complete("specdata", 30:25))
[1] "matrix"
我希望得到:
> class(complete("specdata", 30:25))
[1] "data.frame"
2.同样,数据运算的非常慢!而且位数蛮多的,希望得到具体意见。
第三个问题:
原型:
corr <- function(directory, threshold = 0) {
## 'directory' is a character vector of length 1 indicating
## the location of the CSV files
## 'threshold' is a numeric vector of length 1 indicating the
## number of completely observed observations (on all
## variables) required to compute the correlation between
## nitrate and sulfate; the default is 0
## Return a numeric vector of correlations
这个参考了一位朋友的:
corr <- function(directory,threshold=0){
filenames <- list.files("specdata", full.names=TRUE)
n <-length(filenames)
cr <- numeric()
for (i in 1:332) {
dat <- data.frame(lapply (filenames, read.csv))
datcomplete <- subset(dat, dat$sulfate != "NA" & dat$nitrate != "NA")
check <- length(datcomplete$ID)
if (check >= threshold & check>0) {
cal <- cor(datcomplete$sulfate,datcomplete$nitrate)
cr <- c(cr, cal)
}
}
return(cr)
}
cr <- corr("specdata", 150)
head(cr)
## [1] -0.01896 -0.14051 -0.04390 -0.06816 -0.12351 -0.07589
summary(cr)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.2110 -0.0500 0.0946 0.1250 0.2680 0.7630
cr <- corr("specdata", 400)
head(cr)
## [1] -0.01896 -0.04390 -0.06816 -0.07589 0.76313 -0.15783
但结果也不太对,head是对的,但后面的有一点点出入...希望得到详细意见。
完全没有编程经验上这个课太痛苦了,希望得到达人的帮助!如果有更简洁的公式希望能直接告诉我,可能我的思路本来就不太好~~