全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 SAS专版
3476 5
2012-08-08
悬赏 50 个论坛币 未解决
在反复对大型数据集进行操作时会大大增加IO处理时间,而使用sasfile选项时可以把数据集装载到内存,但是不能用data步对装载到内存的数据集添加删除变量,添加删除观测,也就是只能读。跪求既能把数据集装载到内存,仍然可以读写的方法。
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2012-8-8 15:49:39
进来看看,对于这个问题也不是很懂,关注一下
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2012-8-8 17:14:49
试试modify

data ex;
input x y;
cards;
1 3
2 4
;
run;

sasfile ex open;

data ex;
modify ex;
if x=1 then remove ex;
run;
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2012-8-8 18:30:50
ziyenano 发表于 2012-8-8 17:14
试试modify

data ex;
恩,但是不能添加观测和添加变量。还是谢谢哈。
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2012-8-8 18:40:43
275769263 发表于 2012-8-8 18:30
恩,但是不能添加观测和添加变量。还是谢谢哈。
使用output可以增加观测;修改观测也可以,不过增加删减变量,modify好像不行。
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2012-8-9 10:21:43
You always can write it to a different file, after closing the file in memory, and renaming it to a desired file.

data large_file;
retain x1-x1000 0;
do i=1 to 2*1e5;
   output;
end;
run;

sasfile large_file load;

data small;
set large_file(drop=x50-x1000);
run;

sasfile large_file close;

proc datasets lib=work;
delete large_file;
change small=large_file;
quit;


Here is a detail SAS documentation.
Details
General Information
The SASFILE statement opens a SAS data set and allocates enough buffers to hold the entire file in memory. Once it is read, data is held in memory, available to subsequent DATA and PROC steps or applications, until either a second SASFILE statement closes the file and frees the buffers or the program ends, which automatically closes the file and frees the buffers.
Using the SASFILE statement can improve performance by
reducing multiple open or close operations (including allocation and freeing of memory for buffers) to process a SAS data set to one open or close operation
reducing I/O processing by holding the data in memory.
If your SAS program consists of steps that read a SAS data set multiple times and you have an adequate amount of memory so that the entire file can be held in real memory, the program should benefit from using the SASFILE statement. Also, SASFILE is especially useful as part of a program that starts a SAS server such as a SAS/SHARE server. However, as with most performance-improvement features, it is suggested that you set up a test in your environment to measure performance with and without the SASFILE statement.
Processing a SAS Data Set Opened with SASFILE
When the SASFILE statement executes, SAS opens the specified file. Then when subsequent DATA and PROC steps execute, SAS does not have to open the file for each request; the file remains open until a second SASFILE statement closes it or the program or session ends.
When a SAS data set is opened by the SASFILE statement, the file is opened for input processing and can be used for subsequent input or update processing. However, the file cannot be used for subsequent utility or output processing, because utility and output processing requires exclusive access to the file (member-level locking). For example, you cannot replace the file or rename its variables.
The following table provides a list of some SAS procedures and statements and specifies whether they are allowed if the file is opened by the SASFILE statement:
Processing Requests for a File Opened by SASFILE
Processing Request  Open Mode  Allowed  
APPEND procedure  update  Yes  
DATA step that creates or replaces the file  output  No  
DATASETS procedure to rename or add a variable, add or change a label, or add or remove integrity constraints or indexes  utility  No  
DATASETS procedure with AGE, CHANGE, or DELETE statements  does not open the file but requires exclusive access  No  
FSEDIT procedure  update  Yes  
PRINT procedure  input  Yes  
SORT procedure that replaces original data set with sorted one  output  No  
SQL procedure to modify, add, or delete observations  update  Yes  
SQL procedure with CREATE TABLE or CREATE VIEW statement  output  No  
SQL procedure to create or remove integrity constraints or indexes  utility  No  

Buffer Allocation
A buffer is a reserved area of memory that holds a segment of data while it is processed. The number of allocated buffers determines how much data can be held in memory at one time.
The number of buffers is not a permanent attribute of a SAS file. That is, it is valid only for the current SAS session or job. When a SAS file is opened, a default number of buffers for processing the file is set. The default depends on the operating environment but typically is a small number such as one buffer. To specify a different number of buffers, you can use the BUFNO= data set option or system option.
When the SASFILE statement is executed, SAS automatically allocates the number of buffers based on the number of data set pages and index file pages (if an index file exists). For example:
If the number of data set pages is five and there is not an index file, SAS allocates five buffers.
If the number of data set pages is 500 and the number of index file pages is 200, SAS allocates 700 buffers.
If a file that is held in memory increases in size during processing, the number of allocated buffers increases to accommodate the file. Note that if SASFILE is executed for a SAS data set, the BUFNO= option is ignored.
I/O Processing
An I/O (input/output) request reads a segment of data from a storage device (such as disk) and transfers the data to memory, or conversely transfers the data from memory and writes it to the storage device. When a SAS data set is opened by the SASFILE statement, data is read once and held in memory, which should reduce the number of I/O requests.
CAUTION:
I/O processing can be reduced only if there is sufficient real memory.
If the SAS data set is very large, you might not have sufficient real memory to hold the entire file. If insufficient memory exists, your operating environment can simulate more memory than actually exists, which is virtual memory. If virtual memory occurs, data access I/O requests are replaced with swapping I/O requests, which could result in no performance improvement. In addition, both SAS and your operating environment have a maximum amount of memory that can be allocated, which could be exceeded by the needs of your program. If your program needs exceed the memory that is available, the number of allocated buffers might be decreased to the default allocation in order to free memory.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群