Midterm 2 Solutions

Nearly everyone could improve their scores simply by reading the questions carefully and following directions.  So, for example, if the question asks for a graph, draw a graph.  If it asks to show work, show work.  If it asks to give a confounding variable, then give a confounding variable.  

1. An important part of recovery from heart disease is a change in lifestyle, and so many medical groups provide their patients with classes to help them change their lifestyles.  It is well-known that  a particular diet is very important to recovery, and so many medical groups require their heart patients to attend nutrition courses.   A certain medical group suspects that patients can be helped even  more if they meditate daily.  To test this, they ask for volunteers amongst those patients who are currently attending a nutrition class to participate in a meditation class held after their nutrition class.  Of the 100 or so patients currently enrolled in the year-long nutrition class, about 35 agree to stay for the weekly meditation class. A year later, the 35 meditation students have a significantly lower cholesterol level and a lower resting pulse-rate (a sign of good cardiovascular health.)

a) (5)  Is this a controlled experiment or an observational study?  Explain.
Observational study.  The reason is because researchers had no say over which treatment the subjects received.

Discussion;
The trait that distinguishes observational studies from controlled experiments is how the treatment variable is assigned. In a controlled experiment, the researchers assign subjects to values of the treatment variable.  In an obesrvational study, these assignments are NOT made by the researchers.  In this experiment, the subjects themselves chose which treatment group they would belong to:  nutrition only group or the nutrtion + meditation group.

Many people said things that implied that randomization was required for a controlled study. This is not true.  If I wanted to do a study to see if meditation helped midterm scores, I could assign everyone whose last name begins with A-M to the meditation group and N-Z to the control group, and that would be a controlled study.  Or I could assign all of the men to the meditation and the women to the control group and it is still a a controlled experiment.  Both examples, however, are BAD controlled experiments, but they are still controlled experiments.   Randomization is done to ensure that the various treatment groups are similar with respect to any potential confounding variables.  So it is a good thing, but it does not help you to distinguish between controlled experiments and observational studies.

Other people said that it was an observational study because there was no random sample.  It is possible to have an observational study based on a random sample.  For example, I could take a random sample of hospital records and see what percentage of lung cancer patients smoked compared to what percentage did not smoke.  This is an observational study because the researchers did not choose which treatment group (smokers or non smokers) the subjects belonged to.  But it is a random sample


b) The director of the medical group must decide whether to invest in offering this meditation program on a regular basis. Is this study sufficient evidence to conclude that the meditation class helps patients?  If yes, explain why.  If no, suggest a plausible confounding variable.

No.   A plausible confounding variable would be that people who choose to stay longer for the meditation group are perhaps more motivated to improve their health and would therefore also be more motivated to take care of their health in other ways.

Discussion
Note that confounding factors must have an effect on both the treatment and the response.  For example, some people said "a possible confounder is that people in the meditation class already have lower cholesterol."  That may be true, but that doesn't tell us why people with lower cholesterol would choose to participate in the meditation class.  Others said something along the lines of "some people will naturally exercise more."  Yes, but wouldn't these people be just as likely found in the meditation class as the other class?  If you think not, then you should explain why.


2.   In 2001, the Ashe Student Health Center at UCLA took a random sample of 640 UCLA dorm residents and asked them if they were satisfied with their bodies.  The results are shown below.

    Male    Female    Total
Not Satisfied    50    157    207
Satisfied    147    286    433
Total    197    443    640

a) Suppose we select someone at random from these 640 people.  What's the probability this person will be male or satisfied with their body?  Show all steps. (Blue)

P(M or S) = P(M) + P(S) - P(M and S) = (197/640) + (433/640) - (147/640) = .755

Yellow: P(F or S) = P(F) + P(S) - P(F and S) = (443/640) + (433/640) - (286/640) = .922



Discussion: most got this right.


b) Is satisfaction with body independent of gender?  Show all work.
There are many ways to show this. But it was important that you show it based on the data provided, and not just reason your way around it. The worst approach was to simply state an answer with no reasoning.

P(M | S) = (147/433) = .34
P(M) = (197/640) = .31

These are not equal, and so these events are NOT independent.  

Hence, knowing someones gender (in this data set) tells you something about their body satisfaciton.



c) Which gender is more likely to be satisfied with its body?  Or are both the same?  Explain.
P(S|M) = .75  and P(S | F) = .65  therefore men in this data set are more likely to be satisfied than the women.

It was important to condition on gender, since there were different numbers of men and women.  Comparing P(M|S) with P(F|S) is incorrect because all this would tell us was that there were more females in the data set.

3. A particular high school has instituted a "zero tolerance" policy for  weapons.  To implement this policy, they have installed metal detectors and security guards at all entrances.   If the metal detector alarm sounds, the guards do a search and if any weapons are discovered, the student is expelled.  On the first day of this new policy, the principal has reason to believe that about 1% of the students  will try to bring weapons of some kind into the school. The company that produces the metal detector assures the school that the detector's alarm will ring in 99.9% of the cases in which the student has a weapon.  However, it will also ring in 10% of the cases in which the students do not have weapons.

a) What's the probability that a randomly selected student will have a weapon and the alarm will go off?

NOTE: different exams had slightly different numbers. These are the answers for the blue exam.

It helped to draw a tree, but this wasn't necessary.  Here are the numbers you are given:
P(W) = .01  P(No W) = .99
P(A | W) = .999,  P(No A |W) = .001
P(A | No W) = .10  and P(no A | No W ) = .90

So P(W and A) = P(W) * P(A | W)   (using the definition of conditional probability.)
= .01*.999 = .00999


b) What's the probability that a randomly selected student who walks through the metal detector will have the alarm go off?

P(A) = P(A and W) or P(A and no W)  where "or" means "plus"
= .01*.999 + .99*.10 = .10899


c)  Suppose a student has just passed through the metal detector and the alarm has rung.  What's the probability this student will actually turn out to have a weapon?

Asked to find P(W|A) = P(W and A)/P(A) = .00999/.10899 = .0917


Most people got this right.


4.  A recent Gallup poll asked Americans if they thought President Bush should fire Secretary of Defense Rumsfeld.  Suppose that the truth is that 30% of all Americans believe Rumsfeld should be fired.

a) Let X be a random variable that is a 1 if a randomly selected American believes Rumsfeld should be fired, and a 0 if not.   Make a table of the probability distribution of X and draw a graph.


Answers are for blue exam.

x
P(X=x)
0
.7
1
.3

You also need to draw a graph, and I will try to describe it.  There are only two possible values, and so there should be a point above x=0 that is .7 units tall and a point at x=1 that is .3 units tall.  

A common incorrect answer was to draw a normal curve.  A normal model is appropriate for (some) variables that have continuous values; but this variable has only two possible values: 0 or 1.





b) What is the expected value of X?  Show all steps.
Show work!  E(X) = 0*.7 + 1 *.3 = .3


c) What is the standard deviation of X?   Show all steps.
Var(X) = (0 - .3)2 * .7 + (1 - .3)2 * .3 = .21
SD(X) = sqrt(Var(X)) = .458



d) Suppose that we do our own survey and select a simple random sample of 1003 Americans and ask whether they believe Rumsfeld should be fired.  Let p* represent the proportion in our sample that will agree with this statement.  Find the standard error for p*.  

SE(p*) = sqrt(.3*.7/1003) = .0145
e)  Again, if we take a simple random sample of 1003 Americans, what's the approximate probability that more than 32% of the sample will say he should be fired?  Be sure to state any conditions or assumptions you must make, and verify any conditions that you can.   If you didn't get an answer for part (d), use 0.02.

Very many people lost points because they did not state assumptions and check conditions.  One assumption is that the sample is random, and you are told that it is.  Another is that the sample size is less than 10% of the population size, and we know that the US has more than 10030 people in it, so this is also true.  The final set of conditions, which are VERY IMPORTANT (because if they're not true, then the normal model is invalid), are that np > 10 and n(1-p)>10.  Since we don't know p, we can use p* as a proxy. So you must verify that 1003*.3 > 10 and 1003*.7 > 10.  Since 1003*.3 = 300.9 this is clearly true.

Therefore the normal model holds.  We need to find P(p* > .32) = P(Z > (.32-.3)/.0145) = P(Z > 1.38) = 1-.9162 = .0835

If you used .02 for the SE:
P(p* > .32) = P(Z > 1) = .1587



5.  Suppose we're interested in understanding the mean starting salary of recent college graduations.  To do this we take a random sample of a large number of recent college graduates (say about 1000 such people) and determine their incomes.

a) True or false and explain: a histogram of this data would likely be well described by the normal model.

False.  The distribution of salaries in the population is most likely skewed right.  Since we are taking a random sample, the distribution of the sample will be similar to the distribution of the population and will therefore also be skewed right.






b) A large number of researchers go out and do precisely this study:  they take  a random sample of 1000 recent college graduates, determine their incomes, and find the average income.  True or false and explain:  a histogram of the averages determined by each study would be well described by the normal model.

True.  The central limit theorem says that, because each sample has 1000 people in it and is therefore relatively large, the distribution of averages from the population will be apprxoimately normally distributed.