今天看到一篇不错的帖子,转载如下:
R中的表合并
一、merge函数外连接合并)>merge(x = df1, y = df2, by = "CustomerId", all = TRUE)
CustomerId Product State
1 1 Toaster <NA>
2 2 Toaster Alabama
3 3 Toaster <NA>
4 4 Radio Alabama
5 5 Radio <NA>
6 6 Radio Ohio
左连接:merge(x = df1, y = df2, by ="CustomerId", all.x=TRUE)
CustomerId Product State
1 1 Toaster <NA>
2 2 Toaster Alabama
3 3 Toaster <NA>
4 4 Radio Alabama
5 5 Radio <NA>
6 6 Radio Ohio
右连接:> merge(x = df1, y = df2, by ="CustomerId", all.y=TRUE)
CustomerId Product State
1 2 Toaster Alabama
2 4 Radio Alabama
3 6 Radio Ohio
交叉连接:> merge(x = df1, y = df2, by = NULL)
CustomerId.x Product CustomerId.y State
1 1 Toaster 2 Alabama
2 2 Toaster 2 Alabama
3 3 Toaster 2 Alabama
4 4 Radio 2 Alabama
5 5 Radio 2 Alabama
6 6 Radio 2 Alabama
7 1 Toaster 4 Alabama
8 2 Toaster 4 Alabama
9 3 Toaster 4 Alabama
10 4 Radio 4 Alabama
11 5 Radio 4 Alabama
12 6 Radio 4 Alabama
13 1 Toaster 6 Ohio
14 2 Toaster 6 Ohio
15 3 Toaster 6 Ohio
16 4 Radio 6 Ohio
17 5 Radio 6 Ohio
18 6 Radio 6 Ohio
使用sqldf包查询合并表##习惯使用sql的童鞋是不是有种他乡遇故知的赶脚
##eg:sqldf("SELECT *FROM df1 where CUstomerId=2"
CustomerId Product
1 2 Toaster
内连接:
sqldf("SELECT CustomerId, Product,State FROM df1 JOIN df2 USING(CustomerID)"
CustomerId Product State
1 2 Toaster Alabama
2 4 Radio Alabama
3 6 Radio Ohio
左连接:
sqldf("SELECT CustomerId, Product,State FROM df1 LEFT JOIN df2USING(CustomerID)"
CustomerId Product State
1 1 Toaster <NA>
2 2 Toaster Alabama
3 3 Toaster <NA>
4 4 Radio Alabama
5 5 Radio <NA>
6 6 Radio Ohio
三、使用plyr包library(plyr)
join(df1, df2,type="inner"
CustomerId Product State
1 2 Toaster Alabama
2 4 Radio Alabama
3 6 Radio Ohio
Type的选项包括:inner、left、right、full(内连接,左连接,右连接,外连接)
四、使用dplyr包library(dplyr)
> inner_join(df1,df2,by="CustomerId"
CustomerId Product State
1 2 Toaster Alabama
2 4 Radio Alabama
3 6 Radio Ohio
> left_join(df1,df2)
Joining by: "CustomerId"
CustomerId Product State
1 1 Toaster <NA>
2 2 Toaster Alabama
3 3 Toaster <NA>
4 4 Radio Alabama
5 5 Radio <NA>
6 6 Radio Ohio
> semi_join(df1,df2)
Joining by: "CustomerId"
CustomerId Product
1 2 Toaster
2 4 Radio
3 6 Radio
> anti_join(df1,df2)
CustomerId Product
1 1 Toaster
2 3 Toaster
3 5 Radio