ladies and gentlemen,好久不见!我又来问问题了。现在有两个dataframe:
> head(a)
chr start end dhs
1 chr1 1911945 1912455 DHS1401
2 chr1 1912485 1912835 DHS1402
3 chr1 1914165 1914395 DHS1405
4 chr1 2200820 2201195 DHS1805
5 chr1 2903485 2903635 DHS2770
6 chr1 3788105 3788255 DHS4214
> head(b)
chr start end dhs category target_gene
1 chr1 1911945 1912455 DHS1401 local KIAA1751
2 chr1 1912485 1912835 DHS1402 local KIAA1751
3 chr1 1914165 1914395 DHS1405 local KIAA1751
4 chr1 2200820 2201195 DHS1805 local SKI
5 chr1 2903485 2903635 DHS2770 distal ACTRT2
6 chr1 3788105 3788255 DHS4214 local DFFB
a包含b,我现在要做这样一件事情:如果a中有dhs在b中,则输出b的这一行,否则就输出a的这一行,后面两列就输出NA值,类似于这种:
> head(c,10)
chr start end dhs category target_gene
1 chr1 1911945 1912455 DHS1401 local KIAA1751
2 chr1 1912485 1912835 DHS1402 local KIAA1751
3 chr1 1914165 1914395 DHS1405 local KIAA1751
4 chr1 2200820 2201195 DHS1805 local SKI
5 chr1 2903485 2903635 DHS2770 distal ACTRT2
6 chr1 3788105 3788255 DHS4214 local DFFB
7 chr1 4815360 4816015 DHS5346 local AJAP1
8 chr1 5910045 5910490 DHS6382 <NA> <NA>
81 chr1 6332640 6332935 DHS7033 local ACOT7
9 chr1 7409425 7409895 DHS8350 local CAMTA1
我自己写的程序是这样的:
colnames(a)<-c("chr","start","end","dhs")
colnames(b)<-c("chr","start","end","dhs","category","target_gene")
if(as.character(a[1,4]) %in% b[,4]){
c<-b[match(as.character(a[1,4]),b[,4]),]
}else{
c<-cbind(a[1,],NA,NA)
}
line<-NULL
for (i in 2:nrow(a)){
if(as.character(a[i,4]) %in% b[,4]){
line<-b[match(as.character(a[i,4]),b[,4]),]
}else{
line<-cbind(a[i,],category=NA,target_gene=NA)
}
c<-rbind(c,line)
}
我总感觉这么写很麻烦,不符合R的宗旨,想问问有没有什么简便的写法,浪费各位的时间了,多谢!!!