Practice Midterm 2 Solutions

This is pulled off of old exams for classes that used a different textbook. So this might not be representative of what you will see for your midterm. In particular, you can expect questions about the 5-step process.

1. The bite of a type of fly that lives in Brazil causes lesions on the skins of those bitten. To measure the size of the lesions, physicians give the bite-victims something called a Montenegro (MTN) Skin Test; a small injection of a substance that causes a splotch to appear on the skin. The area of this splotch is proportional to the area of the lesions. Therefore, by measuring the area of the Montenegro skin test splotch, they can measure the area of the lesions.

  1. (5) How would you classify this variable (use all that apply) and why? (Categorical, quantitative, qualitative, continuous, discrete, ordinal.)
This is a quantitative continuous random variable.  Quantitative, because the values are numerical.  Continuous, because if we are measuring "area", then the possible values will be spread along a continuum.

b) (5) In order to estimate the mean area of the MTN skin test, a random sample of 35 fly-bite victims was collected. Here is a summary of the MTN areas (in square millimeters). Find a 90% confidence interval for the mean MTN area.
 
Variable Average Std. Dev Minimum Median Maximum
mtn 35 34.514 24.057  9 33. 120. 

The question should have said to find an approximate 90% confidence interval, because we cannot find an exact CI without assuming that the distribution of areas
is normal.  Without this assumption, the best we can do here is call on the Central Limit Theorem and hope that the sample size is sufficiently large to provide a normal sampling distribution, even if the population distribution is not normal.

Because the SD of the population is unknown, we use the estimated value from this sample: 24.057.  This means we use the t-distribution with n-1 = 34 degrees of freedom.

the formula is now   Xbar +- t* s/sqrt(n)  where t* is the value that has area alpha/2 above it in a t distribution with 34 degrees of freedom.

For a 90% CI, alpha = 10%, so we want the point that "cuts off" an area of 5% in each tail.  We could get this from Table F in the book, but this table stops at 29 degrees of freedom (the next step is "infinity").  The difference between using 29 degrees of freedom and infinity is only .06, so it doesn't make too much difference. But to error on the side of caution, lets make the CI too wide and choose 1.31 as our t* value.

34.514 +- 1.70*(24.057/sqrt(35)
34.514 +- 6.913   (6.913 is the "margin of error".)
(27.601, 41.427)  is the approximate 90% CI for the population mean.

2. Do piano lessons improve the spatial-temporal reasoning of preschool children? Neurobiological arguments suggest that this may be true. A study designed to test this hypothesis measured the spatial-temporal reasoning of 34 preschool children before and after six months of piano lessons. The changes in the reasoning scores are summarized below. (Change is measured as the After score minus the Before score.)

Variable     N     Average     Std. Dev     Minimum     Median     Maximum

reasoning 34     3.6176     3.0552         -3.                 4.             9.

Is there evidence that piano lessons improve the spatial-temporal reasoning of preschool children? To answer this question, do the following:

a) Find a 95% confidence interval for the mean change in reasoning scores.
Here the shape of the distribution of the sample -- unimodal and more-or-less symmetric -- suggests that the distribution of the population is likely to be unimodal and symmetric, too.  This means that the CLT is likely to be a good approximation.  Again, the SD of the population is not known, so we rely on the t-distribution.

3.6176 +- t* 3.0552/sqrt(34)
where t* is the value on a t-distribution with 33 degrees of freedom (n-1) that has an area alpha/2 = .05/2 = .025 above it.  AGain, Table F doesn't have this, but we can conservatively choose the value for 29 degrees of freedom: 2.05.

CI is 3.6176 +- 1.074
(2.54, 4.692)

b) State the assumptions you made to get this confidence interval. Do you think the assumptions are good? Based on what evidence?
We assumed that the population distribution is symmetric and unimodal, which, given the shape of the histogram, is probably a good assumption.  If we want to assume the distribution is normal, then our confidence interval is exact (although we did have to use 29 degrees of freedom rather than 33, but other than that, it would be exact.)  Even the assumption of normality is not too far fetched, since the histogram looks like it could have plausibly come from a normal population. (But in later statistics classes you'll talk about how to test this a little more carefully.)

c) Is your confidence interval evidence that spatial-temporal reasoning was improved by piano lessons? Explain.
Yes.  The true mean improvement is either in this interval or not.  95% of all CIs computed using a study like this one would cover the true value, so there is only a 5% that our study produced a CI that does not contain the true value.  Notice that 0 or negative values are NOT in the interval, so we can be pretty confident that the mean improvement is a positive value: which means that on average people improved.
 

3. (5) In the last problem, suppose that you had found that there was a significant change in reasoning scores. Can you conclude that the change was due to piano lessons? If yes, explain why. If no, explain how you would design a study that would allow you to conclude that piano lessons caused the change.

No, you can't conclude that the piano lessons were to blame because there is no control group.  Perhaps all children improve their reasoning scores after 6 months, whether or not they take piano lessons.  To improve this study, we should randomly assign some children to some sort of "control" activity that does not involve music lessons.

4. An airline knows, through experience, than 10% of the passengers with tickets will not show up for their flight. For this reason, it overbooks flights. For a plane with 100 seats, it will sell 105 tickets. Assume that all ticketed passengers are flying alone and make their decisions about showing up without consulting each other.

a) (5) Let X represent the number of people out of 105 who show up for their flight . What is the distribution of X and why?

X follows a binomial distribution with n = 105 and p = .90.  Why?  X is the number of successes in a fixed number (105) of trials.  Each trial is either a success (shows up) or failure (doesn't show up).  The probabilty of success is the same (90%) for each trial.  We can assume that outcomes are independent, which might not be a good assumption for people traveling together.  None-the-less, without any other evidence and for the sake of getting at least an approximate answer, we make this assumption.  Possible values for X are 0,1,2,...,105

b) (5) What's the probability all 105 passengers will show up? (Don't give a number -- just a formula.)

Because outcomes are independent, P(all 105 show up) = P(passenger #1 shows up AND passg. #2 AND....AND passg. #105) = P(Psg #1)*P(psg. #2)*..*P(psg. # 105) = .9^(105)  (.9 raised to the 105th power.)  This is about 1.6X10^-5 in case you were wondering.

c) (5) Use the normal approximation to find the probability that there will be angry passengers who do not get a seat on the plane.

X is really binomial with n = 105 p = .9.  This means E(X) = np = 105*.9 = 94.5.  And SD(X) = sqrt(np(1-p)) = 3.074.
Because np > 10 and n(1-p) > 10 (just barely!), the normal approximation should be fairly good. So we can "pretend" that
X is N(94.5, 3.074)
P(X > 100) = P(Z > (100 -94.5)/3.074) = P(Z > 1.789)  where Z is a standard normal RV.  Using Table E, we find
P(Z > 1.79) = 1 - P(Z <= 1.79) = 1 - .9633 = .037
so there is a 3.7% chance that too many passengers will show up.
 

5. A well known scientific theory states that the rate of coughs at a symphony concert during flu season is lambda=2.5 per minute. Let X represent the number of coughs in a one minute period during a flu-season symphony concert. Suppose that X follows a Poisson distribution.

a) What is the probability of exactly one cough occurring during this period?
P(X = x) = lambda^x e^(-lambda)/x! = 2.5^x e^-2.5 /x!
P(X = 1) = 2.5 e^-2.5 = .2052 or 20.52%

b) What is the probability of at least one cough during a one minute period?
P(X >= 1) = 1 - P(X = 0) = 1 - e^-2.5 = .9179  or 91.79%

c) Based on "theoretical" considerations, do you think the Poisson distribution is a good model for X? Explain.
We assume that the rate of occurence is constant over time -- whether or not this is true, it is what the "well known theory" states (OK, I made this theory up. But it sounds about right to me.)  We assume that the events are indpt of each other.  This is tricky. Probably one person coughing has nothing to do with someone else coughing, but perhaps one person's cough might affect whether or not that person coughs again. Still, this assumption sounds like it could be plausible. (We can always check our data to see if it fits a poisson model. If not, this might be one reason.)  Third, we assume that events can't occur simultaneously.  We can make this a reasonably sound assumption if we say that we are concerned about the instant a cough begins.  It is unlikely that two coughs begin at exactly the same instance.

d) Here are some data from a study that randomly sampled one-minute intervals in a random selection of flu-season concerts. Use this data to estimate lambda.

1

2

1

2

0

1

2

1

5

2

3

5

3

2

0

Lambda represents the rate of occurence, i.e. the # of coughs per minute. These data are the observed number of coughs in different one-minute intervals.  And so a good candidate for an estimate would be to take the average of this.  The average of these 15 observations is 2.0.

e) Assume that the Poisson model is a good model for this data. But the exact value of the parameter lambda is in doubt. Do the data above support the hypothesis that lambda=2.5? Write a five-step simulation to provide evidence for or against. Explain how you would make a decision.

1. Box model: The box consists of infinitely many tickets with the values 0,1,2,....  each in the frequencies determined by the poisson distribution with lambda = 2.5. Such a "box" exists only in a computer, and DataDesk, for example, has a random number generator that generates numbers from this population.
2. Trial: A trial consists of drawing 15 tickets, with replacement, from this box.  On DataDesk, this means choose a random sample of 15 observations from the Poisson(2.5) distribution. (Under Manip>generate random sample menu.) We then compute the average
3. Def. of a successful trial. In our study up above, we got an average of 2.0.  This is  not equal to 2.5.  The question is, do real poisson distributions sometimes produces averages this far away from 2.5?  What's the probability of this happening? To figure this out, we'll define a successful trial as one in which the average of a trial is <= 2.0 OR >- 3.0  (which is the same distance above 2.5 as 2.0 is below it.) In other words, we are estimating the probability that the observed average is more than .5 coughs/minute away from 2.5, either above or below.
4. Repeat 100 times.
5. The estimated probability is the number of successful trials divided by 100.
Make a decision: The "null hypothesis" says that the mean of the poisson is 2.5 and that what we observed, an average of 2.0, is a common occurence (or at least not unusual.)  If this is true, then the estimated probability should be large-ish to reflect the fact that this event happens fairly often.  So if the estimated probability is "big" we believe in the null hypothesis. But if it is small, then we have evidence that this sort of thing doesn't happen very often, and we should reject the hypothesis that lambda=2.5.  "Big" and "small" are subjective terms, but classically we reject the null hypothesis if this probability is 0.05 or less.

I did this simulation on my computer.  There were 20 cases that were above 3.0 or less than 2.0.  So the estimated probability is 20%.  This means that this is a fairly common occurence when the mean is 2.5, and so there is no reason to think that the 2.0 we observed is unusual.  (In fact, I generated these data from a distribution that was really Poisson with lambda = 2.5.)

6. A bottle-filling machine at the Fizzy Soda bottling and distribution plant is designed to put in 12 ounces of soda into each bottle. In fact, the label on the bottle "guarantees" that it contains 12 ounces of beverage. Because of slight variations, however, the amount it puts in is actually a normal random variable with mean 12.2 ounces and SD 0.2 ounce.

a) What is the probability that a given bottle will have less than 12 ounces?
Let X represent the amount in a bottle. Then X is N(12.2, .2)  P(X < 12) = P(Z < (12-12.2)/.2) = P(Z < -1) = .1587 from Table F.

b) Suppose we take a random sample of 6 bottles. What is the probability that the average amount is less than 12 ounces? Show all steps.
Let Y be the average of the amount in the 6 bottles. So Y = (1/6) (X1 + ... + X6) where Xi is the amount in bottle i.  Then
Y is N(12.2, .2/sqrt(6)) or N(12.2, 0.08164965809277261)
So P(Y < 12) = P(Z < (12 - 12.2)/0.08164965809277261) = P(Z < -2.449) approx. = P(Z < -2.45) = .0071