全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 SPSS论坛
1095 1
2014-04-01

I am processing lots of files with student assessment data.  For one project I have to merge cases from ~120 different files.


Each file is for a different grade, language of test, and administration (e.g., Math3Eng1_2011, Math3Sp1_2011, Math4Eng1_2011, Math5-1_2011, Math6_2011).  Each school year there are 26-34 different files.  I want to be able to easily/efficiently merge the cases from all files for a particular school year.  To make matters more complicated, the file naming structure changes year to year.  I don’t want to merge files one at a time because a total of over 5M records exist and it takes forever to do 25+ data passes.  I don’t know how to merge multiple files in one data pass unless I hard coded the # of files to merge (but this varies).  

I could take a directory listing via windows command prompt, then using an editor add the prefixes and suffixes to each line so it can be pasted into a merge statement, but I wanted a more automated process.  

Anybody have a good suggestions?


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2014-4-1 03:33:35
Install the Python Essentials from the SPSS Community site  ww.ibm.com/developerworks/spssdevcentral) if you haven't already done that;
Change the filespec line ;
Then run this from a syntax window.  

begin program.
import spss, glob

filespec = r"c:/temp/parts/e*.sav"

files = glob.glob(filespec)
cmd = "ADD FILES "
all = " ".join(["""/FILE="%s" """ % f for f in files])
cmd = cmd + all
spss.Submit(cmd)
end program.

dataset name merged.
exec.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群