[size=13.63636302948px]第一题:In 2004, the state of North Carolina released a large data set containing information on
[size=13.63636302948px]births recorded in this state. A random sample of observations from this data set is available
[size=13.63636302948px]at . We have observations on 13 dierent variables, some categorical and some numerical. The
[size=13.63636302948px]meaning of each variable is as follows:
[size=13.63636302948px] fage: fathers age in years.
[size=13.63636302948px] mage: mothers age in years.
[size=13.63636302948px] mature: maturity status of mother.
[size=13.63636302948px] weeks: length of pregnancy in weeks.
[size=13.63636302948px] premie: whether the birth was classied as premature (premie) or full-term.
[size=13.63636302948px] visits: number of hospital visits during pregnancy.
[size=13.63636302948px] marital: whether mother is married or not married at birth.
[size=13.63636302948px] gained: weight gained by mother during pregnancy in pounds.
[size=13.63636302948px] weight: weight of the baby at birth in pounds.
[size=13.63636302948px] lowbirthweight: whether baby was classied as low birthweight (low) or not (not low).
[size=13.63636302948px] gender: gender of the baby, female or male.
[size=13.63636302948px] habit: status of the mother as a nonsmoker or a smoker.
[size=13.63636302948px] whitemom: whether mom is white or not white.
[size=13.63636302948px]Pick a pair of numerical and categorical variables and come up with a research question
[size=13.63636302948px]evaluating the relationship between these variables. Formulate the question in a way that it
[size=13.63636302948px]can be answered using a hypothesis test and/or a condence interval.
[size=13.63636302948px]第二题:The dataset contains the results of a study on gambling amongst
[size=13.63636302948px]teenagers in the UK.
[size=13.63636302948px](a) Estimate a linear regression model with gambled amount (gamble) as the response, and
[size=13.63636302948px]socioeconomic status, income (given in pounds per week), sex (with 0 being males) and
[size=13.63636302948px]verbal score as explanatory variables. Present the output from the estimation.
[size=13.63636302948px](b) Which variables are statistically signicant at the 0:05 signicance level?
[size=13.63636302948px](c) Using the same signicance level, test the hypothesis that the eect of income is equal to
[size=13.63636302948px]3.
[size=13.63636302948px](d) Construct a 90% condence interval for the verbal score variable. What can you conclude
[size=13.63636302948px]about the relationship between the verbal score and gambling?
[size=13.63636302948px](e) Holding all other predictors constant, what would be the dierence in predicted expenditure
[size=13.63636302948px]on gambling for a male compared to a female?
[size=13.63636302948px](f) Using the predict() function, predict the amount that a male with average status, income
[size=13.63636302948px]and verbal score would gamble.
[size=13.63636302948px](g) What percentage of variation in the response is explained by the given predictors?
[size=13.63636302948px](h) Which observation has the largest positive residual?
[size=13.63636302948px](i) Plot a density of the residuals; describe the distribution in terms of skew.
[size=13.63636302948px](j) Compute the correlation of the residuals and the income variable. What did you expect to
[size=13.63636302948px]nd, and on which regression assumption did you base this expectation?