Title
[D] by -- Repeat Stata command on subsets of the data
[U] 11 Language syntax
Syntax
by varlist: stata_cmd
bysort varlist: stata_cmd
The above diagrams show by and bysort as they are typically used. The full syntax of the commands is
by varlist1 [(varlist2)] [, sort rc0]: stata_cmd
bysort varlist1 [(varlist2)] [, rc0]: stata_cmd
Description
Most Stata commands allow the by prefix, which repeats the command for each group of observations for which the values of
the variables in varlist are the same. by without the sort option requires that the data be sorted by varlist; see [D]
sort.
Stata commands that work with the by prefix indicate this immediately following their syntax diagram by reporting, for
example, "by is allowed; see [D] by" or "bootstrap, by, etc., are allowed; see prefix".
by and bysort are really the same command; bysort is just by with the sort option.
The varlist1 (varlist2) syntax is of special use to programmers. It verifies that the data are sorted by varlist1 varlist2
and then performs a by as if only varlist1 were specified. For instance,
by pid (time): gen growth = (bp - bp[_n-1])/bp
performs the generate by values of pid but first verifies that the data are sorted by pid and time within pid.
Options
sort specifies that if the data are not already sorted by varlist, by should sort them.
rc0 specifies that even if the stata_cmd produces an error in one of the by-groups, then by is still to run the stata_cmd on
the remaining by-groups. The default action is to stop when an error occurs. rc0 is especially useful when stata_cmd
is an estimation command and some by-groups have insufficient observations.
Examples
------------------------------------------------------------------------------------------------------------------------------
Setup
. sysuse auto
For each category of foreign, display summary statistics for rep78
. by foreign: summarize rep78
Same as above command, but check that the data are sorted by foreign and make within foreign
. by foreign (make): summarize rep78
not sorted
r(5);
. sort foreign make
. by foreign (make): summarize rep78
For each category of rep78, display frequency counts of foreign
. by rep78: tabulate foreign
not sorted
r(5);
. sort rep78
. by rep78: tabulate foreign
Equivalent to above two commands
. by rep78, sort: tabulate foreign
Equivalent to above command
. bysort rep78: tabulate foreign
For each category of rep78 within categories of foreign, display summary statistics for price
. by foreign rep78, sort: summarize price
------------------------------------------------------------------------------------------------------------------------------
Setup
. sysuse autornd
. keep in 1/20
Store in new variable mean_w the mean value of weight for each category of mpg
. by mpg, sort: egen mean_w = mean(weight)
------------------------------------------------------------------------------------------------------------------------------
Technical note
by repeats the stata_cmd for each group defined by varlist. If stata_cmd saves results, only the results from the last
group on which stata_cmd executes will be saved.