stata的一本手册就专门讲svy的。
[SVY] Stata Survey Data Reference Manual
你可以看看这本书最开始的介绍
Introduction
Stata’s facilities for survey data analysis are centered around the svy prefix command. After you
identify the survey design characteristics with the svyset command, prefix the estimation commands
in your data analysis with “svy:”. For example, where you would normally use the regress command
to fit a linear regression model for nonsurvey data, use svy: regress to fit a linear regression model
for your survey data.
Why should you use the svy prefix command when you have survey data? To answer this question,
we need to discuss some of the characteristics of survey design and survey data collection because
these characteristics affect how we must perform our analysis if we want to get it right.
Survey data are characterized by the following:
 Sampling weights, also called probability weights—pweights in Stata’s terminology
 Cluster sampling
 Stratification
These features arise from the design and details of the data collection procedure. Here’s a brief
description of how these design features affect the analysis of the data:
 Sampling weights. In sample surveys, observations are selected through a random process,
but different observations may have different probabilities of selection. Weights are equal to
(or proportional to) the inverse of the probability of being sampled. Various postsampling
adjustments to the weights are sometimes made, as well. A weight of wj for the jth observation
means, roughly speaking, that the jth observation represents wj elements in the population
from which the sample was drawn.
Omitting weights from the analysis results in estimates that may be biased, sometimes seriously
so. Sampling weights also play a role in estimating standard errors.
 Clustering. Individuals are not sampled independently in most survey designs. Collections of
individuals (for example, counties, city blocks, or households) are typically sampled as a group,
known as a cluster.
There may also be further subsampling within the clusters. For example, counties may be
sampled, then city blocks within counties, then households within city blocks, and then finally
persons within households. The clusters at the first level of sampling are called primary sampling
units (PSUs)—in this example, counties are the PSUs. In the absence of clustering, the PSUs
are defined to be the individuals, or, equivalently, clusters, each of size one.
Cluster sampling typically results in larger sample-to-sample variability than sampling individuals
directly. This increased variability must be accounted for in standard error estimates, hypothesis
testing, and other forms of inference.
 Stratification. In surveys, different groups of clusters are often sampled separately. These groups
are called strata. For example, the 254 counties of a state might be divided into two strata, say,
urban counties and rural counties. Then 10 counties might be sampled from the urban stratum,
and 15 from the rural stratum.
Sampling is done independently across strata; the stratum divisions are fixed in advance. Thus
strata are statistically independent and can be analyzed as such. When the individual strata
are more homogeneous than the population as a whole, the homog