Please try the code below. The first block should generate test data and the second block should
- read the variable names from the first row of the first sheet of the first workbook
- read all data from all lines in all sheets in all workbooks (from line 2)
- output an active DataSet containing the source_file, source_sheet and all data
- string lengths in SPSS should be exactly as long as required given the data contained in the work books
This hasn't been thoroughly tested yet so there may be complications but it seems to work on the test data provided. Please keep us informed on how things are going, OK?
Kind regards,
*Create test data.
begin program.
rdir=r*"d:\temp"* # Please specify a folder in which test files can be
created.
import xlwt,random
for year in range(2004,2014):
wb=xlwt.Workbook()
ws=wb.add_sheet("data")
for col,cont in
enumerate(['EmployeeID','JobTitle','YearSalary','DaysAbsent']):
ws.write(0,col,cont)
for row,id in enumerate([104,21,60,2,1030]):
ws.write(row+1,0,id)
for row in range(5):
ws.write(row+1,1,random.choice(['Developer','Tester','Manager']))
for row in range(5):
ws.write(row+1,2,random.randrange(40,80)*1000)
for row in range(5):
ws.write(row+1,3,random.choice(range(20)))
wb.save(os.path.join(rdir,'data_%d.xls'%year))
end program.
*Read and merge all xls workbooks.
begin program.
rdir=r*"d:\temp"* # Please specify folder holding .xls files
import xlrd,spss
fils=[fil for fil in os.listdir(rdir) if fil.endswith(".*xls*")] # Should
probably be "xlsx" in your case.
allData=[]
for cnt,fil in enumerate(fils):
wb=xlrd.open_workbook(os.path.join(rdir,fil))
for ws in wb.sheets():
for row in range(1,ws.nrows):
allData.append([fil]+[ws.name]+[val for val in
ws.row_values(row)])
if cnt==0:
Names=["source_file"]+["source_sheet"]+ws.row_values(0)
mxLens=[0]*len(vNames)
for line in allData:
for cnt in range(len(line)):
if isinstance(line[cnt],basestring) and len(line[cnt])>mxLens[cnt]:
mxLens[cnt]=len(line[cnt])
with spss.DataStep():
nds = spss.Dataset('*') ### nds = "New Data Set"
for vrbl in zip(vNames,mxLens):
nds.varlist.append(vrbl[0],vrbl[1])
for line in allData:
nds.cases.append(line)
end program.
Some notes:
- Make sure you have no open dataset when you run this
- A crucial assumption is that the structure (column orders) are identical over sheets over workbooks
- The first rows of all sheets in all workbooks should hold (identical) variable names
- You need to have 1) SPSS, 2) SPSS Python essentials and 3) the Python xlrd module properly installed
- You may need to replace ".xls" with ".sav" in the second block
- Date variables should be no problem but will look weird in SPSS. To convert a date called "date_1" to a normal date, try
compute date_new=datesum(date.dmy(3,1,1900),date_1,"days").
format date_new(datetime22).
- This should work although there seems to be some kind of bug somewhere so please check carefully.