Homework 9
Due Friday, May 31
1) A poll by the LA Times asked 1167 residents of LA if "Do you think Los
Angeles is primarily a segregated city?". 350 of those polled identified
as White, and 50% said that they thought the city was segregated. 262
of the respondants identified as Black, and 47% of these people said that
they thought the city was segregated. Do an appropriate hypothesis
test to see whether blacks and whites differ in the percentage of those who
believe LA is segregated.
For a fairly complete analysis of the poll, including all questions asked,
read this
pdf file
. Note that the sampling scheme was more complicated than a simple
random sample (which is why Blacks make up about 30% of the sample but about
11% of the LA population). But assume, for the sake of this problem,
that sampling was done with replacement.
2) Consider, again, the
body temperature
data. Perform a hypothesis test to test whether the median body temperature
is 98.6 degrees Farenheit. Use the bootstrap technique. Follow
these steps:
a) state the null and alternative hypothesis
b) As a test statistic, we'll use the sample median. What is the observed
value of the sample median?
c) Adjust the data so that the sample median is now 98.6. (Hint: add
or subtract the appropriate amount from every observation). Now take
a sample, with replacement, of 130 and calculate the median. Repeat
this 1000 times, and you'll have a "bootstrap sample" of medians for which
we know that the true value is 98.6.
d) Based on your bootstrap sample, what is your estimate of the probability
that a sample median will be as extreme OR MORE than the observed value of
your sample median (in step b)?
e) Based on your answer to (d), would you reject the null hypothesis?
3) Suppose you are shown a suspicious looking coin, and asked to determine
whether it is "fair". Because your time is limited, you decide that
you will flip the coin 10 times. If it lands on heads 0, 1, 9, or 10
times, you will declare it "unfair". Otherwise, you will conclude that
there is no evidence to call it unfair.
a) What is the probability that you will declare a fair
coin unfair? (i.e. What's the significance level?)
b) Suppose that the reality is that the coin is unfair.
In fact, assume that for this coin, p = .55. For this coin, what
is the probability that your procedure will correctly identify it as unfair?
This is the power at p = .55.
c) Redo part (b), but now assume p = .60. Note that
the power increases as p gets further from the null hypothesis value of .50.
d) Suppose that p = .55, but now you change your procedure.
You flip the coin 100 times, and you declare the coin unfair if it
lands heads 10% or fewer, or 90% or more. Now what's the probability
that you will correctly declare the coin unfair? Note that the power
increases as the sample size increases.
(Hint: if X is a binomial random variable with parameters n and p,
then the R command dbinom(x,n,p) gives you P(X = x).)
e) Back to the old procedure of flipping the coin
10 times. Now we will declare the coin unfair if it lands heads 0,1,2,8,9,
or 10. What is the significance level now?
b) What is the power for this new procedure if we assume
that the truth is that p = .55. How has it changed from (b)? What
can you say about the relation between power and significance level?
4) A special managed care programwas implemented at 9 randomly chosen hospitals
in an HMO system. At 7 of those hospitals, the average cost per patient
decreased over the previous year. Is this evidence that the managed
care program saves money?
a) Let X represent the number of hospitals out of the 9 that spend less money
per patient. Before the study was conducted, this was a random
variable. According to the null hypothesis, what value should we assign
to p = the probability that a hospital will spend less money per patient.
b) What's the alternative hypothesis?
c) Let X be your test statistic. What's the sampling distribution of
X, according to the null hypothesis?
d) The observed value of X is 7. What's the p-value?
e) If you use a significance level of 5%, do you reject the null hypothesis?
5) A study published in 1982 in the Journal of Epidimiology (Morton, et.
al.) examined the children of workers at a battery factory. These workers
were exposed to lead while at work, and there was concern that lead dust
could be brought home and infect the workers' children. Once ingested,
lead gets into the bloodstream and the body cannot not remove it. Therefore,
over time, the level of lead accumulates. This is particularly dangerous
in children, because excessive lead levels cause developmental problems.
The file lead.dat
contains the blood lead levels
for 33 children of workers at this factory. (Lead levels are measuerd in
units of micrograms of lead per deciliter of blood.) They are labeled
"Exposed". The other column, labeled "Control" consists of the blood
lead levels for 33 "matched" children. These were children of the same
age, living in the same neighborhood, but whose parents did not work around
lead. So for example, the 4th child in the Exposed column is the same
age and lives in the same neighborhood as the 4th child in the Control column.
a) Without looking at the data, what shape do you think the distribution
of Exposed children's lead levels will look like? Explain why.
b) Look at the distributions of the Exposed and Control lead scores. What
differences do you see in the shapes? Do you see evidence that the
Exposed children differ from the Control children?
c) Most toxicologists believe that lead levels over 50 micrograms per deciliter
require medical treatment, and levels over 60 require immediate hospitilization.
How does this information affect your comparison of the two groups?
d) Create a new variable called "diff" that is equal to the Exposed levels
minus the Control levels. Describe the distribution of this variable.
e) What does the variable diff tell you about the difference between Exposed
and Control children?
f) Perform the appropriate hypothesis test to test whether the mean of diff
is 0. State the null and alternative hypotheses, and your conclusion.
Also state any assumptions that you made regarding the population and
the sample.
g) Why did the experimenters choose matches from the same neighborhood? Of
the same age?