全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 SPSS论坛
12784 7
2014-03-17
现在有多个个案,里面变量都一样,我想合并到一个个案当中,但是每次合并都只能选择一个个案,必须一个一个添加,太麻烦了,我想问高手如何在浏览时添加多个个案,这样就能一个都搞定了,视图如下: QQ截图20140317104705.png QQ截图20140317104648.png QQ截图20140317104629.png
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2014-3-17 15:46:03
编程可以的.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2014-3-19 11:21:37
kuangsir6 发表于 2014-3-17 15:46
编程可以的.
普通操作不可以么?能说详细点么?
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2014-3-20 05:41:51

You need a macro because you are going to do the same thing for every file.


For file 1 you have this.

Get data ….

*   y is the variable whose values are recorded minute by minute every day

varstocases make y from day1 to day366/index=day.

sort cases by day time.

Save outfile=<cumulant file name>

*   File 2 and on.

Get data ….

varstocases make y from day1 to day366/index=day.

sort cases by day time.

Match files file=<cumulant file name>/file=*/rename=(y=y2)/by time day.

Save outfile=<cumulant file name>

If you run spss with ‘have multiple data files open’ enabled, which I don’t do, you will have, I think, a slightly different structure due to needing to keep track of opening and closing datasets.



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2014-4-1 02:55:58
Please try the code below. The first block should generate test data and the second block should

  • read the variable names from the first row of the first sheet of the first workbook
  • read all data from all lines in all sheets in all workbooks (from line 2)
  • output an active DataSet containing the source_file, source_sheet and all data
  • string lengths in SPSS should be exactly as long as required given the data contained in the work books

This hasn't been thoroughly tested yet so there may be complications but it seems to work on the test data provided. Please keep us informed on how things are going, OK?

Kind regards,



*Create test data.
begin program.
rdir=r*"d:\temp"* # Please specify a folder in which test files can be
created.
import xlwt,random
for year in range(2004,2014):
     wb=xlwt.Workbook()
     ws=wb.add_sheet("data")
     for col,cont in
enumerate(['EmployeeID','JobTitle','YearSalary','DaysAbsent']):
         ws.write(0,col,cont)
     for row,id in enumerate([104,21,60,2,1030]):
                 ws.write(row+1,0,id)
     for row in range(5):

ws.write(row+1,1,random.choice(['Developer','Tester','Manager']))
     for row in range(5):
                 ws.write(row+1,2,random.randrange(40,80)*1000)
     for row in range(5):
                 ws.write(row+1,3,random.choice(range(20)))
     wb.save(os.path.join(rdir,'data_%d.xls'%year))
end program.

*Read and merge all xls workbooks.

begin program.
rdir=r*"d:\temp"* # Please specify folder holding .xls files
import xlrd,spss
fils=[fil for fil in os.listdir(rdir) if fil.endswith(".*xls*")] # Should
probably be "xlsx" in your case.
allData=[]
for cnt,fil in enumerate(fils):
     wb=xlrd.open_workbook(os.path.join(rdir,fil))
     for ws in wb.sheets():
         for row in range(1,ws.nrows):
             allData.append([fil]+[ws.name]+[val for val in
ws.row_values(row)])
     if cnt==0:
         Names=["source_file"]+["source_sheet"]+ws.row_values(0)
mxLens=[0]*len(vNames)
for line in allData:
     for cnt in range(len(line)):
         if isinstance(line[cnt],basestring) and len(line[cnt])>mxLens[cnt]:
             mxLens[cnt]=len(line[cnt])
with spss.DataStep():
     nds = spss.Dataset('*') ### nds = "New Data Set"
     for vrbl in zip(vNames,mxLens):
         nds.varlist.append(vrbl[0],vrbl[1])
     for line in allData:
         nds.cases.append(line)
end program.

Some notes:

  • Make sure you have no open dataset when you run this
  • A crucial assumption is that the structure (column orders) are identical over sheets over workbooks
  • The first rows of all sheets in all workbooks should hold (identical) variable names
  • You need to have 1) SPSS, 2) SPSS Python essentials and 3) the Python xlrd module properly installed
  • You may need to replace ".xls" with ".sav" in the second block
  • Date variables should be no problem but will look weird in SPSS. To convert a date called "date_1" to a normal date, try

compute date_new=datesum(date.dmy(3,1,1900),date_1,"days").
format date_new(datetime22).


  • This should work although there seems to be some kind of bug somewhere so please check carefully.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2014-4-1 03:05:14
Jon Peck from IBM SPSS wrote an extension command that would fit the bill here without having to roll your own Python - the
`SPSSINC PROCESS FILES` command

Here is an example in the developerworks forum <http://www.ibm.com/developerworks/forums/thread.jspa?messageID=14573567>of the tool in action with a near synonymous situation.

Also with a real consistent naming structure and no missing ID's it would be pretty simple to write this up in a macro, see  this other answer I gave recently <http://spssx-discussion.1045642.n5.nabble.com/Looping-td5716527.html>  . All that would need to be changed is (what I have done anyways) is to have the first pass of the loop create a basefile, and then successively add
files concatenate all of the new files to that basefile.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群