全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学
5920 9
2010-04-23
我用proc import导入excel文件。这个excel文件性质不太好,第一个变量x前500条为空,第二个变量y前750条为空,但是三个变量都是数值型的。导入程序如下:

PROC IMPORT OUT= WORK.a
            DATAFILE= "Z:\stockstudy\test.xls"
            DBMS=EXCEL REPLACE;
     RANGE="Sheet1$";
     GETNAMES=YES;
     MIXED=NO;
     SCANTEXT=YES;
     USEDATE=YES;
     SCANTIME=YES;
RUN;

问题在于生成的数据集中第一、第二个变量被sas搞成字符串型,而不是我所需要的数值型。有人了解如何解决吗?谢谢!!!

数据test.xls已附上。
附件列表

test.xls

大小:119 KB

 马上下载

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2010-4-23 22:41:07
把Excel文件另存为CSV文件,然后

PROC IMPORT OUT= WORK.a
            DATAFILE= "Z:\stockstudy\test.xls"
            DBMS=CSV REPLACE;
  GUESSINGROWS=1000;
RUN;
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2010-4-23 22:59:07
我判断一下你这个问题估计是因为
在读入数据的时候,SAS对于初始变量默认为数值型缺失值 即为一个点。
初始变量的长度是根据第一个观测对象,相应变量的长度来决定的,如果没有其他格式化语句的话
第一个观测的三个变量值分别为空格 空格 0.666
在SAS里面,空格是字符变量的缺失值,那么第一个变量和第二个变量 SAS系统认为是字符型变量,默认长度为2,所以对于后面的值来说就全部为字符缺失了
给你一个建议就是 你在EXCEL第一个观测里面 把前两个变量值填上,再导入就没得问题了
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2010-4-23 23:02:29
3# crackman
你的方法可行。只是问题是我实际所用数据有200多列,都有那种问题。一个一个改工作量太大。不过还是谢谢你!
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2010-4-23 23:03:04
也可以用 SAS Import Wizard, 选 User-defined formats, 自己改成数值型
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2010-4-23 23:10:59
1# warrenzhang

首先感谢楼上2位的热心帮助!!!

终于找到问题的根源了,原来是微软jet engine的设置问题。
参见SAS帮助:


MIXED=YES | NO; (available only for DBMS=EXCEL) specifies whether to convert numeric data into character data values for a column that contains mixed data types. The Microsoft Jet/Excel engine handles this option.
YESassigns a SAS character type for the column and converts all numeric data values to character data values when mixed data types are found. When you specify MIXED=YES, the connection is set in import mode and no updates are allowed.
CAUTION:Due to a limitation in the Microsoft Jet/Excel engine, using the MIXED=YES option might result in improper text variable lengths. It is strongly recommended that you first follow the steps below and check that you are receiving the results that you expect before you continue.   From Microsoft Windows, follow these steps to change the value of TypeGuessRows.
  • Start Run In the Run window dialog box, enter Regedt32 and select OK.
  • Open the following key in the Registry Editor window:
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel
  • Double-click TypeGuessRows.
  • In the Edit DWORD dialog box under Base, select Decimal, enter 0 in the Value field, and select OK.
  • Exit from the Registry Editor window.
NOassigns numeric or character type for the column, depending on the majority of the type data that is found. Numeric data in a character column is imported as missing values, and character data in a numeric column is imported as missing values.
The following registry settings can affect the behavior of the MIXED= option. They are located in the [\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel] key of the Microsoft Windows registry.
Registry Settings for the MIXED OptionTypeGuessRowsan integer type with a default value of 8. You can use the number of rows in the worksheet range in scans to determine column types. If you set this type to 0, all rows in the range are checked. Microsoft states that the valid range of TypeGuessRows is 0-16. However, you could set as high as 16384, and it would still operate correctly.CAUTION:Not only can changing the TypeGuessRows value cause a scan to fail if you set it higher than 16384, but it affects any software that uses the Microsoft Jet provider to access Excel file data, including accessing Excel data in a Microsoft Access database. The TypeGuessRows value is registered with and controlled by Microsoft. It is therefore recommended that you set the value to 0.   ImportMixedTypesa string type with a default value of Text. If a column contains more than one type of data while scanning TypeGuessRows rows, the type of the column is determined to be Text if the value of the setting is Text. If the value of the setting is Majority, the most common type in the column determines the column type. For the MIXED=YES option to work correctly, you should you change TypeGuessingRows to 0 in the Microsoft Windows registry so that all the rows in the specified range are scanned. As a result, when you use MIXED=YES, the Jet provider always assigns character type for columns with data of mixed data types and converts numeric data to character data.


也就是说,默认状态MS Jet是把混合型数据当成字符串(text)处理的,因此如果import的选项mixed=yes打开的话,就会有可能出现上述结果。根据提示,应该把注册表项TypeGuessingRows改0,ImportMixedTypes改Majority。这样做才能从根本上解决问题。


PS。强烈鄙视微软!!!鄙视EXCEL!!!遗憾的是,许多公司都在把excel当数据库用,简直是莫名其妙。用csv格式多好啊,xls记录2维表格非常臃肿,数据处理也不方便。
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群