In the first example, the two independent variables are from an existing dataset and the dependent variable is generated based on the two independent variables plus some random error. The dependent variable is then regressed on the two independent variables.
* Set up the steps you want to repeat for the simulation in a program
program define myprog1
* drop all variables to create an empty dataset, do not use clear
drop _all
* get dataset
use
http://www.ats.ucla.edu/stat/stata/faq/hsb2
* keep the independent variables (IVs)
keep write math
* gen dependent variable (DV) with set relationship to IVs + random error
gen y = 7.541 + .3283*math + .5196*write + 7.281 * invnormal(uniform())
* run the desired command
reg y write math
end
* use the simulate command to rerun myprog1 1000 times
* collect the betas (_b) and standard errors (_se) from the regression each time
* You'll probably want to set reps(10) for testing, then set it higher for the simulation.
simulate _b _se, reps(1000): myprog1
The second example is similar to the first, except that the data are random draws from a normal distribution with a given correlational structure using the command drawnorm. Covariances can also be used by specifying the cov() option instead of corr(). If no correlation or covariance structure is specified, the variables generated will be orthogonal. The code below also specifies means and standard deviations for the variables, but this is not strictly necessary.
* Set up the steps you want to repeat for the simulation in a program
program define myprog2
* drop all variables to create an empty dataset, do not use clear
drop _all
* create a vector that contains the equivalent of a lower triangular correlation matrix
matrix c = (1, 0.5968, 1, 0.6623, 0.6174, 1)
* create a vector that contains the means of the variables
matrix m = (52.23,52.775,52.645)
* create a vector that contains the standard deviations
matrix sd = (10.25,9.47,9.36)
* draw a sample of 1000 cases from a normal distribution with specified correlation structure
* and specified means and standard deviations
drawnorm x1 x2 y, n(1000) corr(c) cstorage(lower) means(m) sds(sd)
* run the desired command
reg y x1 x2
end
* use the simulate command to rerun myprog2 1000 times
* collect the betas (_b) and standard errors (_se) from the regression each time
* You'll probably want to set reps(10) for testing, then set it higher for the simulation.
simulate _b _se, reps(1000): myprog2