Exercise 4
In this exercise you are required to write simple code in R, and compare three dierent ways of
programming a simple multivariate OLS regression problem.
a. Create a dataset of 100 observations with four exogenous variables and a random error term.
The exogenous variables are: x1, a constant term; x2, a vector containing random draws
from a standard normal distribution; x3, a vector containing random draws from a uniform
distribution between -10 and +10; and x4, a dummy variable that has a value of 1 for the
rst 50 observations and 0 elsewhere. The error terms are independently normally distributed
with mean zero and variance 4. Group the variables xi in a matrix X.
b. Calculate y-observed on the basis of a simple linear specication assuming that the true betas
are 1. Group the x variables (except for the constant) and the dependent variable in a data
array and assign names for the variables.
c. Perform a regression using the command lm().
d. Calculate the estimated betas and standard errors yourself by writing simple code based on
standard econometric textbook notation. Compare your results to the lm() output.
e. Write alternative code based on ecient usage of solve() and the crossprod() operator.
See the General Principles 1 and 2 in Bates (2004) available at:
http://www.r-project.org/doc/Rnews/Rnews 2004-1.pdf
Compare your results to the lm() output, and discuss one advantage and one disadvantage
of using the solve operator eciently.