之间我想制作一个脚本,用于对表格内的属性分类后进行t检验。如下表:TT,TC,CC是三种不同的基因型,要做T检验。于是我通过书找到了一位大佬写的代码。
| rs# | AX.108768500 | AX.110999783 | AX.108750670 | AX.109504524 | AX.110395687 | PHE | |
| alleles | T/C | C/G | A/C | T/C | A/G | NA | |
| S43 | TT | GG | AA | CC | AA | 29.3417 | |
| S151 | TC | CG | AC | CC | AA | 32.0381 | |
| S189 | CC | CC | AC | TT | NN | 37.3938 | |
| S172 | CC | CC | CC | TC | AA | 35.4436 | |
| S292 | CC | CC | AC | NN | GG | 38.068 | |
| S246 | CC | CC | CC | CC | AA | 31.0997 | |
| S251 | CC | CC | CC | TC | GG | 33.724 | |
| S262 | CC | CC | AC | TT | GG | 32.6727 | |
| S238 | CC | CC | AC | TT | GG | 33.0237 | |
| S249 | CC | CC | CC | CC | AA | 22.5027 | |
| S294 | NN | CC | AC | TC | AA | 29.6675 | |
| S164 | CC | CC | NN | TT | AA | 31.769 | |
| S201 | CC | CC | CC | CC | AA | 34.4939 | |
<# Nonparametric pairwise multiple comparisons using the Wilcoxon Signed Rank Test
# Probability values are adjusted using the p.adjust function
wmc <- function(formula, data, exact=FALSE, sort=TRUE, method="holm"){
# setup
df <- model.frame(formula, data)
y <- df[[1]]
x <- as.factor(df[[2]])
# reorder levels of x by median y
if(sort){
medians <- aggregate(y, by=list(x), FUN=median)[2]
index <- order(medians)
x <- factor(x, levels(x)[index])
}
groups <- levels(x)
k <- length(groups)
# summary statisticsa
stats <- function(z)(c(N = length(z), Median = median(z), MAD = mad(z)))
sumstats <- t(aggregate(y, by=list(x), FUN=stats)[2])
rownames(sumstats) <- c("n", "median", "mad")
colnames(sumstats) <- groups
cat("Descriptive Statistics\n\n")
print(sumstats)
# multiple comparisons
mc <- data.frame(Group.1=character(0),
Group.2=character(0),
W=numeric(0),
p.unadj=numeric(0),
p=numeric(0),
stars=character(0),
stringsAsFactors=FALSE)
# perform Wilcoxon test
row <- 0
for(i in 1:k){
for(j in 1:k){
if (j > i){
row <- row + 1
y1 <- y[x==groups
]
y2 <- y[x==groups[j]]
test <- wilcox.test(y1, y2, exact=exact)
mc[row,1] <- groups
mc[row,2] <- groups[j]
mc[row,3] <- test$statistic
mc[row,4] <- test$p.value
}
}
}
mc$p <- p.adjust(mc$p.unadj, method=method)
# add stars
mc$stars <- " "
mc$stars[mc$p < .1] <- "."
mc$stars[mc$p < .05] <- "*"
mc$stars[mc$p < .01] <- "**"
mc$stars[mc$p < .001] <- "***"
names(mc)[6] <- " "
cat("\nMultiple Comparisons (Wilcoxon Rank Sum Tests)\n")
cat(paste("Probability Adjustment = ", method, "\n\n", sep=""))
print(mc[-4], right=TRUE)
cat("---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n")
return(invisible(NULL))
}>
计算的方式和结果如下:
< wmc(PHE~AX.108768500,data = data, method="holm")
Descriptive Statistics
CC TC NN TT
n 81.00000 78.000000 56.000000 57.000000
median 28.32410 29.428200 29.531900 30.244000
mad 4.23134 3.461426 4.091531 4.067661
Multiple Comparisons (Wilcoxon Rank Sum Tests)
Probability Adjustment = holm
Group.1 Group.2 W p
1 CC TC 2596 0.26309005
2 CC NN 1972 0.58718842
3 CC TT 1647 0.02555667 *
4 TC NN 2284 0.65353668
5 TC TT 1978 0.58718842
6 NN TT 1330 0.50938174
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1>
但是问题是可以的到结果不能循环。比如我讲第一行作为一个list,想用wmc(PHE~list[1]...的方式构建循环发现,语句就会有错误。
在我找了一位很厉害的人询问后他给我出了一个办法。
<for(i in 2:(ncol(data)-1)){
f <- as.formula(paste0('PHE~',colnames(data)))
wmc(f,data,colnames(data))
}>
利用paste函数,先将formula准备好,然后再利用别人编写好的函数。这种方式可以很快利用到别人的函数或者包。
<escriptive Statistics (AX.108768500)
CC TC NN TT
n 81.00000 78.000000 56.000000 57.000000
median 28.32410 29.428200 29.531900 30.244000
mad 4.23134 3.461426 4.091531 4.067661
Multiple Comparisons (Wilcoxon Rank Sum Tests)
Probability Adjustment = holm
Group.1 Group.2 W p
1 CC TC 2596 0.26309005
2 CC NN 1972 0.58718842
3 CC TT 1647 0.02555667 *
4 TC NN 2284 0.65353668
5 TC TT 1978 0.58718842
6 NN TT 1330 0.50938174
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Descriptive Statistics (AX.110999783)
NN CG CC GG
n 2.000000 3.0000000 89.000000 178.000000
median 27.041850 27.3595000 28.610100 29.789700
mad 1.479116 0.4004503 4.352024 3.506127
Multiple Comparisons (Wilcoxon Rank Sum Tests)
Probability Adjustment = holm
Group.1 Group.2 W p
1 NN CG 2 1.00000000
2 NN CC 67 1.00000000
3 NN GG 78 0.87256823
4 CG CC 139 1.00000000
5 CG GG 200 1.00000000
6 CC GG 6306 0.03985078 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Descriptive Statistics (AX.108750670)
CC NN AC AA
n 48.000000 8.00000 36.000000 180.000000
median 27.886150 28.74170 28.867600 29.816850
mad 3.652089 3.31539 4.285826 3.606573
Multiple Comparisons (Wilcoxon Rank Sum Tests)
Probability Adjustment = holm
Group.1 Group.2 W p
1 CC NN 161 1.0000000
2 CC AC 787 1.0000000
3 CC AA 3064 0.0119307 *
4 NN AC 146 1.0000000
5 NN AA 596 1.0000000
6 AC AA 2561 0.2373465
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Descriptive Statistics (AX.109504524)
NN TC CC TT
n 5.000000 94.000000 99.000000 74.000000
median 26.566300 28.927850 29.574300 29.657450
mad 7.919604 3.936451 2.972465 3.913249
Multiple Comparisons (Wilcoxon Rank Sum Tests)
Probability Adjustment = holm
Group.1 Group.2 W p
1 NN TC 191 1.0000000
2 NN CC 185 1.0000000
3 NN TT 150 1.0000000
4 TC CC 3989 0.5229268
5 TC TT 3173 1.0000000
6 CC TT 3801 1.0000000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
谢谢大家观看,觉得有用麻烦点一个赞,谢啦~
有问题可以问我,尽管我还是小白,但是只要是我能回答的我都会尽力回答。