Nearest neighbor matching estimation for average treatment effects
nnmatch depvar treatvar varlist_nnmatch [weight] [if exp] [in range] [, tc({ate | att | atc}) m(#) metric(maha
| matname) exact(varlist_ex) biasadj(bias | varlist_adj) robusth(#) population level(#) keep(filename)
replace]
pweights are allowed. See help weights for more information about weights. See section 5.2 (Abadie et al. 2004)
for inforamtion about how nnmatch handles weights.
depvar, varlist_nnmatch, and elements of biasadj(varlist_adj) and exact(varlist_ex) must be numeric variables.
treatvar must be a {0,1} variable.
Description
nnmatch estimates the average treatment effect on depvar by comparing outcomes between treated and control
observations (as defined by treatvar), using nearest neighbor matching across the variables defined in
varlist_nnmatch. nnmatch can estimate the treatment effect for the treated observations, the controls, or the
sample as a whole. The program pairs observations to the closest m matches in the opposite treatment group to
provide an estimate of the counterfactual treatment outcome. The program allows for matching over a
multi-dimensional set of variables (varlist_nnmatch), giving options for the weighting matrix to be used in
determining the optimal matches. It also allows exact matching (or as close as possible) on a subset of
variables. In addition, the program allows for bias correction of the treatment effect and estimation of either
the sample or population variance, with or without assuming a constant treatment effect (homoskedasticity).
Finally it allows observations to be used as a match more than once, thus making the order of matching irrelevant.
See Abadie et al. (2004) for further detail.
Options
tc({ate | att | atc}) specifies which treatment effect is to be estimated:
ate: the average treatment effect,
att: the average treatment effect for the treated, or
atc: the average treatment effect for the controls.
If no option is specified, the average treatment effect, ate, is assumed. In this case, all observations are
matched to their nearest m neighbors of the opposite treatment group. In estimating the att or atc, only the
treated or controls, respectively, are matched.
m(#) specifies the number of matches to be made per observation. If two observations of the opposite treatment
group are equally close to that being matched, both will be used. Thus, the number of matches per observation
will be greater than or equal to m. If the average treatment effect is selected, m must be less than or equal
to the smaller of N0 and N1, where N0 is the number of control observations in the dataset, and N1 is the
number of treatment observations. If tc(att) is selected, m need only be less than or equal to N0; if tc(atc)
is selected, m must be less than or equal to N1. If m(#) is not specified, 1 is assumed.
metric(maha | matname) specifies the weighting matrix to be used when k, the number of elements of
varlist_nnmatch, is greater than 1. The metric() option specifies the relative weight to be placed on each
variable in varlist_nnmatch in defining nearest neighbor matches. Two options are available:
(1) metric(maha) specifies the Mahalanobis metric, the inverse of the sample variance-covariance matrix of the
k variables in varlist_nnmatch.
(2) metric(matname) allows for a user-defined weight matrix matname, where matname is an already-specified
k-dimensional, symmetric, and positive semi-definite matrix.
If no option is specified, the default is to use the k*k diagonal matrix of the inverse sample standard errors
of the k variables in varlist_nnmatch.
exact(varlist_ex) allows you to specify exact matching (or as exact as possible) on one or more variables. The
exact-matching variables need not overlap with the elements of varlist_nnmatch. In practice, however, the
exact() option adds these variables to the original k*k varlist_nnmatch matrix, but in the weight matrix
multiplies each exact element by 1,000 relative to the weights placed on the elements of varlist_nnmatch.
(Regardless of the metric() option chosen for the varlist_nnmatch variables, the exact-match variables are
normalized via the default option -- the inverse sample errors.) Because for each matched observation there
may not exist a member of the opposite treatment group with equal value, matching may not be exact across the
full dataset. The output lists the percentage of matches (across the paired observations, greater than or
equal to N*m in number) that match exactly.
biasadj(bias | varlist_adj) specifies that the bias-corrected matching estimator be used. The simple matching
estimator estimates the average treatment effect by calculating the average over the N observations being
matched of the difference between the depvar outcome for observation i and the average outcomes for its m
matches in the opposite treatment group. However, the simple matching estimator will be biased if matching is
not exact. This option regression-adjusts the results using the original matching variable(s),
varlist_nnmatch (if biasadj(bias) is selected), or a newly-specified set of variables, varlist_adj (if
biasadj(varlist_adj) is chosen).
robusth(#) specifies that nnmatch estimate heteroskedasticity-consistent standard errors using # matches in the
second matching stage (across observations of the same treatment level). The program does this by conducting
a second matching process (again across the elements of varlist_nnmatch), this time matching observations in
the same treatment group, to compare variability in outcomes (depvar) for observations with approximately the
same varlist_nnmatch values. robusth(#) allows the user to choose how many matches are used in this process.
If robusth() is not selected or # equals zero, homoskedastic errors are estimated.
population specifies the calculation of the population variance rather than the sample variance. If population is
not selected, sample variance is assumed.
level(#) specifies the confidence level, as a percentage, for confidence intervals. The default is level(95) or
as set by set level.
keep(filename) saves the temporary matching dataset in the file filename.dta. In the estimation process, nnmatch
creates a temporary dataset holding, for each observation i being matched, a new observation containing the
values of is outcome variable (depvar), the matching variable(s), varlist_nnmatch, values, and the outcome and
varlist_nnmatch values for its m closest matches. Thus, the new dataset will hold at least N*m observations.
If biasadj(varlist_adj) or exact(varlist_ex) are selected, the temporary dataset will also hold these values
for each observation i and its match(es) j. keep(filename) allows you to save this temporary dataset.
If keep(filename) is selected, each observation of filename.dta will hold the following variables:
t: The treatment group indicator, treatvar, for the observation being matched, i.
y: The observed outcome variable, depvar(i).
x: The varlist_nnmatch values for observation i.
id: The identification code for the observation being matched, i.
(When the command nnmatch is given, the program creates a temporary variable, id =
{1,2,...N}, based on the original sort order.)
index: The identification code for j, the match for observation i.
dist: The estimated distance between observation i and its match j, based on the
varlist_nnmatch values of each and the selected weight matrix.
k_m: The number of times observation i is itself used as a match for any observation l of the
opposite treatment group, each time weighted by the total number of matches for the
given observation l.
(For example, if observation i is one of three matches for observation l, it receives a
value of 1/3 for that match. k_m(i) is the sum, across all observations l, of this
value. Thus the sum of k_m across all observations i will equal N (or N0 or N1, if the
atc or att, respectively, are estimated). Note that this value refers to is use as a
match, not to its matches j, so the value of k_m is equal across all observations in the
temporary dataset that pertain to the matching of observation i.
w_id: Weight for observation i, if weights are selected.
w_index: Weight of observation j, the match for observation i, if weights are selected.
`y'_0: The inferred depvar value if observation i were in the control group.
(If observation i is in fact a control observation, `y'_0 = `y'(i). If i is a treated
observation, `y'_0 = `y'(j).)
`y'_1: Inferred depvar value if i were in the treated group.
`x'_0m: Values of varlist_nnmatch for i's `control' observation. Namely, if i is a control
observation, `x'_0m = x_i for each element x of varlist_nnmatch. If i is a treatment
observation, `x'_0m will equal x_j.
`x'_1m: Values of varlist_nnmatch for i's `treatment' observation.
`b'_0b: Values of the bias-adjustment variables (if biasadj(varlist_adj) is selected) for is
`control' observation, where `b' represents each element of the bias-adjustment
variables.
`b'_1b: Bias-adjustment variables for is `treatment' observation.
`e'_0e: Values of the exact-matching variables (if exact(varlist_ex) is selected) for i's
`control' observation, where `e' represents each element of the exact-matching
variables.
`e'_1e: Exact-matching variables for i's `treatment' observation.
replace replaces the dataset specified in keep(filename) if it already exists.