全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 SPSS论坛
1234 1
2014-05-06

Can SPSS commands (e.g., MERGE FILES) be used to perform a left outer join between 2 SPSS datasets? Assume that the join field is not unique in either dataset.

Example: Let the left Dataset1 contains 2 fields - ClassNbr and Fact1 - and these 4 records . . .

1 A
1 D
2 A
3 B

Let Dataset2 contains 2 fields - ClassNbr and Fact2 - and these 3 records . . .

1 XX

1 XY
3 ZZ

I want to join Dataset1 and Dataset2 on ClassNbr. The desired result is a 6 record dataset as follows:

1 A XX

1 A XY
1 D XX
1 D XY
2 A (NULL)
3 B ZZ

I would prefer a solution that uses SPSS commands (as opposed to SQL/Python/etc.).



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2014-5-6 01:46:30
As far as I'm aware you can not do this directly. One potential way to do the workaround is to "reshape" the data from long format to wide format (using casestovars), do the merge, and then reshape back into long format (using varstocases). Below is a use example (if any clarification is needed on the code just ask).

data list free / ClassNbr (F1) Fact1 (A1).
begin data
1 A
1 D
2 A
3 B
end data.
dataset name data1.

casestovars
/id = ClassNbr.

data list free / ClassNbr (F1) Fact2 (A2).
begin data
1 XX
1 XY
3 ZZ
end data.
dataset name data2.

casestovars
/id = ClassNbr.

match files file = 'data1'
/file = 'data2'
/by ClassNbr.
execute.

varstocases
/make Fact1 FROM Fact1.1 to Fact1.2
/null = KEEP.
varstocases
/make Fact2 FROM Fact2.1 to Fact2.2
/null = KEEP.
This creates some cases that you do not want, here I have just defined a set of commands to identify those cases and take them out (I'm sure this could be improved to be more efficient).

*now cleaning up the extra records.
compute flag = 0.
if ClassNbr = lag(ClassNbr) and Fact1 = lag(Fact1) and Fact2 = lag(Fact2) flag = 1.
select if flag = 0.
execute.
if Fact1 = " " and Fact2 = " " flag = 1.
select if flag = 0.
execute.
if ClassNbr = lag(ClassNbr) and Fact1 = lag(Fact1) and Fact2 = " " flag = 1.
select if flag = 0.
execute.
if ClassNbr = lag(ClassNbr) and Fact2 = lag(Fact2) and Fact1 = " " flag = 1.
select if flag = 0.
execute.
I'm sure it would be possible to make this more robust (probably making some custom python functions). But hopefully this helps get you started.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群