1.You wish to predict the sale price of single-family residences in Massachusetts using property features (commonly called a “hedonic pricing model”). You collect price and property features data on properties sold in the state for the year 2010 and obtain the following regression:
Pricei = 14407.60 – 759.92*houseagei + .24*lotsizei + 354.35*bldareai + 12015.61*roomsi + μi
(6433.23) (89.67) (.115) (265.39) (8516.47)
Observations = 2691 R2 = 0.49 F = 65.10
Where:
houseage = age of the house (in years)
lotsize = total square feet of the land
bldarea = total square feet of the interior of the house
rooms = total number of rooms in the house
A.How would you categorize, or label, this dataset? Defend your answer.
B. What is the interpretation of the constant term in this regression? Why is it included?
C.How do we interpret the coefficient on lotsize? Why is the coefficient on lotsize nominally small if we expect it to have a large impact on the price of a house?
D.What is the predicted price of a house that is 7 years old, with a lot size of 800 square feet, a building interior of 400 square feet, and 5 rooms? Will this predicted price be close to the actual price? Why or why not?
E.Explain what is meant that the value of the R2 = .49. What is one good reason and one bad reason to use R2 as a measure of the “goodness of fit” of a regression?
F.Test the significance of each independent variable in the model using α = .05. Are these findings expected? Why or why not, and what could explain your findings?
G.Construct a 95% confidence interval for houseage in the model above. What this measure is telling us? How will consistency in your OLS estimation affect your confidence intervals?
H.After thinking about your model further, you wish to add median income as variable in your regression. You collect data on the median income of each census tract in Massachusetts in the year 2010, and match that to your housing data. Assuming you believe that your original form of the model suffered from omitted variable bias, in what direction would you expect your estimates to change with the inclusion of median income? Defend your answers.
I.Suppose you ran the same model as above only using log(price) instead of price and obtain an R2 of 0.54 and an F-statistic of 68.17. Based on this information, are we able to say which version (level or log) of the model is better? Explain why or why not.