我在看Proc sql 和data step的对比一书里,第五章写到
如果用data step,寻找两个表格里的共同名字的时候,data step有两个步骤,1是data step,2是要format,
哪位高手可以帮忙解释一下,为什么要用put function进行format呢? 不明白为什么要用proc format?
多谢。
Let’s specify a task that we can perform with the DATA step or with PROC SQL, using
in turn each of the four devices: EXISTS, IN, ANY, and ALL. The task is to apply a
WHERE filter that will pass along the rows for those girls in the MOREGIRLS table who
are already in the CLASSGIRLS table. A quick glance at the tables tells us that the subset
in question includes Jane and only Jane, but we will (repeatedly) make SAS discover that.
The DATA step code should follow this form:
DATA already;
SET moregirls;
WHERE … ;
RUN;
and the SQL should look like this:
CREATE TABLE already AS
SELECT *
FROM moregirls
WHERE …
;
For the DATA step, the filter can be built with the PUT function and a format derived
from the CLASSGIRLS data set. We want the names from CLASSGIRLS to populate the
format, so we first build the data set expected by PROC FORMAT by running:
DATA cntl;
LENGTH label $ 1;
SET classgirls(RENAME=(fname=start)) end=last;
RETAIN label 'Y'
fmtname 'ClassGirls'
type 'c';
OUTPUT;
IF LAST THEN DO;
hlo='O';
label=' ';
OUTPUT;
END;
RUN;
Reference: Read more about creating formats in the Base SAS 9.2 Procedures Guide:
Procedures: The FORMAT Procedure: Example 5: Creating a Format from a Data Set.
To actually establish the format, we run:
PROC FORMAT LIBRARY=work CNTLIN=cntl;
RUN;
Finally, we insert into the DATA step a WHERE statement containing a call to the PUT
function that in turn refers to our format. This restricts processing to the names in
MOREGIRLS that are already in CLASSGIRLS. The code is:
DATA already;
SET moregirls;
WHERE PUT(fname,$classgirls.)='Y';
RUN;
The output is shown in Exhibit 5-3.
Exhibit 5-3 ALREADY
Indeed, Jane is the only name in common.
Notice that we had to run two SAS steps (a DATA step and a PROC FORMAT step) just
to prepare, before we even turned to coding our WHERE statement. In contrast, SQL can
reference the lookup table directly within the WHERE condition and without such preprocessing.
The EXISTS Condition
SQL provides a condition (similar to an operator), called EXISTS, which looks for a
subquery as its operand and returns a value of true (1) if evaluation of that subquery
returns one or more rows, or false (0) if evaluation of the subquery returns no rows. We
FName Age
Jane 12