spss如何合并多个个案？请高手帮助

anklebreak

12929

收藏 2014-03-17

现在有多个个案，里面变量都一样，我想合并到一个个案当中，但是每次合并都只能选择一个个案，必须一个一个添加，太麻烦了，我想问高手如何在浏览时添加多个个案，这样就能一个都搞定了，视图如下：

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

kuangsir6

2014-3-17 15:46:03

编程可以的.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

anklebreak

2014-3-19 11:21:37

kuangsir6 发表于 2014-3-17 15:46
编程可以的.

普通操作不可以么？能说详细点么？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-3-20 05:41:51

You need a macro because you are going to do the same thing for every file.

For file 1 you have this.

Get data ….

* y is the variable whose values are recorded minute by minute every day

varstocases make y from day1 to day366/index=day.

sort cases by day time.

Save outfile=<cumulant file name>

* File 2 and on.

Get data ….

varstocases make y from day1 to day366/index=day.

sort cases by day time.

Match files file=<cumulant file name>/file=*/rename=(y=y2)/by time day.

Save outfile=<cumulant file name>

If you run spss with ‘have multiple data files open’ enabled, which I don’t do, you will have, I think, a slightly different structure due to needing to keep track of opening and closing datasets.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-4-1 02:55:58

Please try the code below. The first block should generate test data and the second block should

read the variable names from the first row of the first sheet of the first workbook
read all data from all lines in all sheets in all workbooks (from line 2)
output an active DataSet containing the source_file, source_sheet and all data
string lengths in SPSS should be exactly as long as required given the data contained in the work books

This hasn't been thoroughly tested yet so there may be complications but it seems to work on the test data provided. Please keep us informed on how things are going, OK?

Kind regards,

*Create test data.
begin program.
rdir=r*"d:\temp"* # Please specify a folder in which test files can be
created.
import xlwt,random
for year in range(2004,2014):
   wb=xlwt.Workbook()
   ws=wb.add_sheet("data")
   for col,cont in
enumerate(['EmployeeID','JobTitle','YearSalary','DaysAbsent']):
      ws.write(0,col,cont)
   for row,id in enumerate([104,21,60,2,1030]):
               ws.write(row+1,0,id)
   for row in range(5):

ws.write(row+1,1,random.choice(['Developer','Tester','Manager']))
   for row in range(5):
               ws.write(row+1,2,random.randrange(40,80)*1000)
   for row in range(5):
               ws.write(row+1,3,random.choice(range(20)))
   wb.save(os.path.join(rdir,'data_%d.xls'%year))
end program.

*Read and merge all xls workbooks.
begin program.
rdir=r*"d:\temp"* # Please specify folder holding .xls files
import xlrd,spss
fils=[fil for fil in os.listdir(rdir) if fil.endswith(".*xls*")] # Should
probably be "xlsx" in your case.
allData=[]
for cnt,fil in enumerate(fils):
   wb=xlrd.open_workbook(os.path.join(rdir,fil))
   for ws in wb.sheets():
      for row in range(1,ws.nrows):
         allData.append([fil]+[ws.name]+[val for val in
ws.row_values(row)])
   if cnt==0:
      Names=["source_file"]+["source_sheet"]+ws.row_values(0)
mxLens=[0]*len(vNames)
for line in allData:
   for cnt in range(len(line)):
      if isinstance(line[cnt],basestring) and len(line[cnt])>mxLens[cnt]:
         mxLens[cnt]=len(line[cnt])
with spss.DataStep():
   nds = spss.Dataset('*') ### nds = "New Data Set"
   for vrbl in zip(vNames,mxLens):
      nds.varlist.append(vrbl[0],vrbl[1])
   for line in allData:
      nds.cases.append(line)
end program.

Some notes:

Make sure you have no open dataset when you run this
A crucial assumption is that the structure (column orders) are identical over sheets over workbooks
The first rows of all sheets in all workbooks should hold (identical) variable names
You need to have 1) SPSS, 2) SPSS Python essentials and 3) the Python xlrd module properly installed
You may need to replace ".xls" with ".sav" in the second block
Date variables should be no problem but will look weird in SPSS. To convert a date called "date_1" to a normal date, try

compute date_new=datesum(date.dmy(3,1,1900),date_1,"days").
format date_new(datetime22).

This should work although there seems to be some kind of bug somewhere so please check carefully.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-4-1 03:05:14

Jon Peck from IBM SPSS wrote an extension command that would fit the bill here without having to roll your own Python - the
`SPSSINC PROCESS FILES` command

Here is an example in the developerworks forum <http://www.ibm.com/developerworks/forums/thread.jspa?messageID=14573567>of the tool in action with a near synonymous situation.

Also with a real consistent naming structure and no missing ID's it would be pretty simple to write this up in a macro, see this other answer I gave recently <http://spssx-discussion.1045642.n5.nabble.com/Looping-td5716527.html> . All that would need to be changed is (what I have done anyways) is to have the first pass of the loop create a basefile, and then successively add
files concatenate all of the new files to that basefile.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

点击查看更多内容…

ReneeBK

2014-4-1 03:15:58

SPSSINC PROCESS FILES  can do this pretty easily.  You will wind up doing an ADD FILES for each Excel file (after the first) even though ADD FILES can handle 50 files at a time, but that's probably not going to be an issue unless you have to do this several times per second.

A few tips beyond the example that Andy pointed out.

Getting this process started is a little bit tricky, since ADD FILES requires that you already have a data file open.
Move one of your Excel files to a different directory and open it with GET DATA /TYPE=XLSX or interactively.  Give it a dataset name, say, ACTIVE, so that it will remain open and referenceable  as other files are read.

Your syntax file to be applied to each dataset by PROCESS FILES would just have statements like
GET DATA /TYPE XLS .../FILE="JOB_INPUTFILE" ...
DATASET NAME=FRED.
ADD FILES /FILE=ACTIVE /FILE="JOB_INPUTFILE".
DATASET CLOSE FRED.
JOB_INPUTFILE is defined by PROCESS FILES as a file handle for the name of the current input.  It will be redefined each time another file is processed.

You can then construct the PROCESS FILE command from the menus via Utilities > Process Data Files.
The input filespec would be something like
c:\mydata\*.xlsx

After process files is run, you can save the constructed file in the usual way.

You can, of course, do this with Python or even Basic scripting more directly, but it probably isn't worth the trouble to do that.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-4-1 03:20:01

* Open the first Excel file from its own separate directory and give it the dataset name 'active'.

GET DATA /TYPE=XLSX
  /FILE='U:\.AU Work\Client Files\XXXXX\CGMS data processing\first file\DD001_Baseline_Excel_Raw Data.xlsx'
  /SHEET=name 'SPSS'
  /CELLRANGE=full
  /READNAMES=on
  /ASSUMEDSTRWIDTH=32767.
EXECUTE.
DATASET NAME active.
* Call the PROCESS FILES command to loop over all other Excel files.
DATASET ACTIVATE active.
SPSSINC PROCESS FILES INPUTDATA="U:\.AU Work\Client Files\XXXXX\CGMS data processing\*.xlsx"
SYNTAX="U:\.AU Work\Client Files\XXXXX\CGMS data processing\ImportFromExcelAndMerge.sps"
CONTINUEONERROR=YES
VIEWERFILE= "U:\.AU Work\Client Files\XXXXX\CGMS data processing\final output.spv"
CLOSEDATA=NO
MACRONAME="!JOB"
LOGFILEMODE=APPEND
/MACRODEFS ITEMS.

============

And the called syntax file, ImportFromExcelAndMerge.sps:
GET DATA
  /TYPE=XLSX
  /FILE="JOB_INPUTFILE"
  /SHEET=name 'SPSS'
  /ASSUMEDSTRWIDTH=32767.
DATASET NAME incoming.
DATASET ACTIVATE active.
ADD FILES /FILE=* /FILE='incoming'.
EXECUTE.
DATASET CLOSE incoming.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群