1. A non-parametric analysis of real data. (The following sub-questions are based on Question 3 in the
first assignment.)
This question examines the bacteria levels from an animal experiment where each rat was given flesh
wounds in two locations. One location was randomly select for a topical treatment and the other was
left as a control. The bacteria level at each wound was measured a few days later to see if the
treatment had an effect on the bacteria level.
a) Create a vector named diff that is the treated bacteria level minus the control bacteria level.
b) Create a vector named boot.data that creates 1000 bootstrap samples from our vector diff. The
length of boot.data should be 1000 times n (where n is the sample size of diff).
c) Calculate the median of each of the 1000 bootstraps. You can do this by transforming the 1000*n
values into a structure where each row represents one bootstrap sample. Then you could calculate
the median for each row to return 1000 bootstrapped medians. Call the resulting vector
boot.medians.
d) Now estimate the bootstrap median with a non-parametric 95% confidence interval by using the
50th , 2.5th and 97.5th percentile of the sample of bootstrapped medians. (Hint: look up the
quantile() function)
e) Write an expression to perform a two-sided sign-test to test if there was a difference in the
bacterial levels between treated and control wound. This can be done by using binom.test() to
test if the proportion of positive values in diff is different than 0.5.
f) Create a vector W that contains the Wilcoxon singed-rank statistic (W) for diff. This is done by
ranking the absolute values and then summing the ranks that correspond to values that were
originally positive.
g) Under the null hypothesis that diff is centered at zero (i.e. the average of the positive and negative
rankings are equal), Z w
has an asymptotic normal distribution , where
Z w
is:
Find the probability that under the null hypothesis, the absolute value of
Z w
is greater than that
observed in our sample vector diff .
h) Search the help for a built in R function to perform the Wilcoxon signed-rank test. Return the
two-sided p-value from this function. Note: turn off the exact option and the continuity correction
so that this p-value matches the probability you calculated in g) (at least to a couple of significant
digits).
i) Now rather than relying on the central limit theorem to approximate
Z w
as N(0,1), use the same
built in R function to calculate the exact version of the test.
j) Perform the same exact calculation using the psignrank() function.