Midterm 2 Solutions
Nearly everyone could improve their scores simply by reading the questions
carefully and following directions. So, for example, if the question
asks for a graph, draw a graph. If it asks to show work, show work.
If it asks to give a confounding variable, then give a confounding
variable.
1. An important part of recovery from heart disease is a change in lifestyle,
and so many medical groups provide their patients with classes to help them
change their lifestyles. It is well-known that a particular diet
is very important to recovery, and so many medical groups require their heart
patients to attend nutrition courses. A certain medical group
suspects that patients can be helped even more if they meditate daily.
To test this, they ask for volunteers amongst those patients who are currently
attending a nutrition class to participate in a meditation class held after
their nutrition class. Of the 100 or so patients currently enrolled
in the year-long nutrition class, about 35 agree to stay for the weekly meditation
class. A year later, the 35 meditation students have a significantly lower
cholesterol level and a lower resting pulse-rate (a sign of good cardiovascular
health.)
a) (5) Is this a controlled experiment or an observational study?
Explain.
Observational study. The reason is because researchers had no say over
which treatment the subjects received.
Discussion;
The trait that distinguishes observational studies from controlled experiments
is how the treatment variable is assigned. In a controlled experiment, the
researchers assign subjects to values of the treatment variable. In
an obesrvational study, these assignments are NOT made by the researchers.
In this experiment, the subjects themselves chose which treatment group
they would belong to: nutrition only group or the nutrtion + meditation
group.
Many people said things that implied that randomization was required for
a controlled study. This is not true. If I wanted to do a study to
see if meditation helped midterm scores, I could assign everyone whose last
name begins with A-M to the meditation group and N-Z to the control group,
and that would be a controlled study. Or I could assign all of the
men to the meditation and the women to the control group and it is still
a a controlled experiment. Both examples, however, are BAD controlled
experiments, but they are still controlled experiments. Randomization
is done to ensure that the various treatment groups are similar with respect
to any potential confounding variables. So it is a good thing, but
it does not help you to distinguish between controlled experiments and observational
studies.
Other people said that it was an observational study because there was no
random sample. It is possible to have an observational study based
on a random sample. For example, I could take a random sample of hospital
records and see what percentage of lung cancer patients smoked compared to
what percentage did not smoke. This is an observational study because
the researchers did not choose which treatment group (smokers or non smokers)
the subjects belonged to. But it is a random sample
b) The director of the medical group must decide whether to invest in
offering this meditation program on a regular basis. Is this study sufficient
evidence to conclude that the meditation class helps patients? If yes,
explain why. If no, suggest a plausible confounding variable.
No. A plausible confounding variable would be that people who choose
to stay longer for the meditation group are perhaps more motivated to improve
their health and would therefore also be more motivated to take care of their
health in other ways.
Discussion
Note that confounding factors must have an effect on both the treatment and
the response. For example, some people said "a possible confounder
is that people in the meditation class already have lower cholesterol." That
may be true, but that doesn't tell us why people with lower cholesterol would
choose to participate in the meditation class. Others said something
along the lines of "some people will naturally exercise more." Yes,
but wouldn't these people be just as likely found in the meditation class
as the other class? If you think not, then you should explain why.
2. In 2001, the Ashe Student Health Center at UCLA took a random
sample of 640 UCLA dorm residents and asked them if they were satisfied with
their bodies. The results are shown below.
Male Female Total
Not Satisfied 50 157
207
Satisfied 147 286 433
Total 197 443 640
a) Suppose we select someone at random from these 640 people. What's
the probability this person will be male or satisfied with their body?
Show all steps. (Blue)
P(M or S) = P(M) + P(S) - P(M and S) = (197/640) + (433/640) - (147/640)
= .755
Yellow: P(F or S) = P(F) + P(S) - P(F and S) = (443/640) + (433/640) - (286/640)
= .922
Discussion: most got this right.
b) Is satisfaction with body independent of gender? Show all work.
There are many ways to show this. But it was important that you show it based
on the data provided, and not just reason your way around it. The worst approach
was to simply state an answer with no reasoning.
P(M | S) = (147/433) = .34
P(M) = (197/640) = .31
These are not equal, and so these events are NOT independent.
Hence, knowing someones gender (in this data set) tells you something
about their body satisfaciton.
c) Which gender is more likely to be satisfied with its body? Or
are both the same? Explain.
P(S|M) = .75 and P(S | F) = .65 therefore men in this data set
are more likely to be satisfied than the women.
It was important to condition on gender, since there were different numbers
of men and women. Comparing P(M|S) with P(F|S) is incorrect because
all this would tell us was that there were more females in the data set.
3. A particular high school has instituted a "zero tolerance" policy for
weapons. To implement this policy, they have installed metal detectors
and security guards at all entrances. If the metal detector alarm
sounds, the guards do a search and if any weapons are discovered, the student
is expelled. On the first day of this new policy, the principal has
reason to believe that about 1% of the students will try to bring weapons
of some kind into the school. The company that produces the metal detector
assures the school that the detector's alarm will ring in 99.9% of the cases
in which the student has a weapon. However, it will also ring in 10%
of the cases in which the students do not have weapons.
a) What's the probability that a randomly selected student will have a weapon
and the alarm will go off?
NOTE: different exams had slightly different numbers. These are the answers
for the blue exam.
It helped to draw a tree, but this wasn't necessary. Here are the numbers
you are given:
P(W) = .01 P(No W) = .99
P(A | W) = .999, P(No A |W) = .001
P(A | No W) = .10 and P(no A | No W ) = .90
So P(W and A) = P(W) * P(A | W) (using the definition of conditional
probability.)
= .01*.999 = .00999
b) What's the probability that a randomly selected student who walks through
the metal detector will have the alarm go off?
P(A) = P(A and W) or P(A and no W) where "or" means "plus"
= .01*.999 + .99*.10 = .10899
c) Suppose a student has just passed through the metal detector
and the alarm has rung. What's the probability this student will actually
turn out to have a weapon?
Asked to find P(W|A) = P(W and A)/P(A) = .00999/.10899 = .0917
Most people got this right.
4. A recent Gallup poll asked Americans if they thought President Bush
should fire Secretary of Defense Rumsfeld. Suppose that the truth is
that 30% of all Americans believe Rumsfeld should be fired.
a) Let X be a random variable that is a 1 if a randomly selected American
believes Rumsfeld should be fired, and a 0 if not. Make a table
of the probability distribution of X and draw a graph.
Answers are for blue exam.
You also need to draw a graph, and I will try to describe it. There
are only two possible values, and so there should be a point above x=0 that
is .7 units tall and a point at x=1 that is .3 units tall.
A common incorrect answer was to draw a normal curve. A normal model
is appropriate for (some) variables that have continuous values; but this
variable has only two possible values: 0 or 1.
b) What is the expected value of X? Show all steps.
Show work! E(X) = 0*.7 + 1 *.3 = .3
c) What is the standard deviation of X? Show all steps.
Var(X) = (0 - .3)2 * .7 + (1 - .3)2 * .3 = .21
SD(X) = sqrt(Var(X)) = .458
d) Suppose that we do our own survey and select a simple random sample
of 1003 Americans and ask whether they believe Rumsfeld should be fired.
Let p* represent the proportion in our sample that will agree with this statement.
Find the standard error for p*.
SE(p*) = sqrt(.3*.7/1003) = .0145
e) Again, if we take a simple random sample of 1003 Americans, what's
the approximate probability that more than 32% of the sample will say he
should be fired? Be sure to state any conditions or assumptions you
must make, and verify any conditions that you can. If you didn't
get an answer for part (d), use 0.02.
Very many people lost points because they did not state assumptions and check
conditions. One assumption is that the sample is random, and you are
told that it is. Another is that the sample size is less than 10% of
the population size, and we know that the US has more than 10030 people in
it, so this is also true. The final set of conditions, which are VERY
IMPORTANT (because if they're not true, then the normal model is invalid),
are that np > 10 and n(1-p)>10. Since we don't know p, we can
use p* as a proxy. So you must verify that 1003*.3 > 10 and 1003*.7 >
10. Since 1003*.3 = 300.9 this is clearly true.
Therefore the normal model holds. We need to find P(p* > .32) =
P(Z > (.32-.3)/.0145) = P(Z > 1.38) = 1-.9162 = .0835
If you used .02 for the SE:
P(p* > .32) = P(Z > 1) = .1587
5. Suppose we're interested in understanding the mean starting salary
of recent college graduations. To do this we take a random sample of
a large number of recent college graduates (say about 1000 such people) and
determine their incomes.
a) True or false and explain: a histogram of this data would likely be well
described by the normal model.
False. The distribution of salaries in the population is most likely
skewed right. Since we are taking a random sample, the distribution
of the sample will be similar to the distribution of the population and will
therefore also be skewed right.
b) A large number of researchers go out and do precisely this study:
they take a random sample of 1000 recent college graduates, determine
their incomes, and find the average income. True or false and explain:
a histogram of the averages determined by each study would be well described
by the normal model.
True. The central limit theorem says that, because each sample has
1000 people in it and is therefore relatively large, the distribution of
averages from the population will be apprxoimately normally distributed.