HW 7 Solutions


P. 315 #12

I'm going to use X to represent group 1 and Y to represent group 2 because the notation is easier to follow on the web.

a) So Var(Xbar - Ybar) = Var(Xbar) + Var(Ybar) because they are indepdent
= sigma2X/n1 + sigma2Y/n2
So SD is the square-root of this:   sqrt(1.82^2/53 + 1.53^2/60) = .3186

b) (7.9-4.3) +/- 2 *.3186
3.6 +/- 2.9628
(2.96, 4.24)

c) This suggests that the "true" difference between the population means is somewhere in the range of 2.96 to 4.24, which suggests that the mean of the first group is greater than the mean of the second group. So we can pretty confidently conclude that people remember more brands in ads with sexual content.

Extra* (but required):  in class we took a random sample of 7 serial numbers from the population {1,2,....N}, where N was an unknown number.  Each serial number came from a "captured tank".
a) Make a sketch of the pdf of the population (obviously in terms of N)

It is a uniform distribution:  there is a "point mass" at 1/N above each of the points 1,2, ...., N

b) Let Xi represent the serial number on the ith tank we capture.  Find an expression for the expected value of Xi. Find an expression for the standard deviation of Xi.
E(Xi) = sum (i * (1/N)) = (1/N) sum i   = (N+1)/2   (This last step is as simple as you can get it, but it's okay if you stop the step before.

Var(Xi) = sum (i - (N+1)/2)2 (1/N)  and this can be simplified a litle more, but that's not necessary.  Of course you need to take the square rot to get the SD, and I can't do this on the web.  Let's call this number "sigma".

c) Suppose we calculate Y = (X1 + ... + X7)/7   Find the mean of Y in terms of the mean of X.  Find the SD of Y in terms of the SD of X.
E(Y) = (E(X1) + ... + E(X7))/7 = 7*E(X)/7  = (N+1)/2
Var(Y) =  sigma/sqrt(7)

d) A popular choice for  an estimator for N was Xbar + 3* SD(X).  What's the bias of this estimator?  How does the bias change if we take a larger sample size?
Bias = E(Xbar + 3SD(X)) - N
E(Xbar) + 3sigma - N
(N+1)/2 + 3sigma - N
notice that n -- the sample size -- does not appear (and doesn't appear in sigma).  So changing the sample size won't affect the bias.

e)  Another choice was Xbar + 3*SD(X)/sqrt(n) .  What's the bias of this estimator?  How does the bias change if we take a larger sample size?

Bias = E(Xbar + 3SD(X)/sqrt(n))- N = (N+1)/2 + 3sigma/sqrt(n) - N
Now n does appear, and as n gets very large, the middle term gets very small.

Note:  Strictly speaking, the Xi's are not independent. Why? Because the population is finite (it has N tanks) and so every time we draw one out without replacement, we gain information about the population and therefore the probability that the next Xi will have a certain value is different than it was before we knew the previous tank's value.  On the other hand, if the population is really big compared to the sample size, then this doesn't matter so much.  Hopefully you saw this point, but it's okay if you didn't write it out in this problem.