Homework 9

Due Friday, May 31

1) A poll by the LA Times asked 1167 residents of LA if "Do you think Los Angeles is primarily a segregated city?". 350 of those polled identified as White, and 50% said that they thought the city was segregated. 262 of the respondants identified as Black, and 47% of these people said that they thought the city was segregated. Do an appropriate hypothesis test to see whether blacks and whites differ in the percentage of those who believe LA is segregated.

For a fairly complete analysis of the poll, including all questions asked, read this pdf file . Note that the sampling scheme was more complicated than a simple random sample (which is why Blacks make up about 30% of the sample but about 11% of the LA population). But assume, for the sake of this problem, that sampling was done with replacement.

2) Consider, again, the body temperature data. Perform a hypothesis test to test whether the median body temperature is 98.6 degrees Farenheit. Use the bootstrap technique. Follow these steps:
a) state the null and alternative hypothesis
b) As a test statistic, we'll use the sample median. What is the observed value of the sample median?
c) Adjust the data so that the sample median is now 98.6. (Hint: add or subtract the appropriate amount from every observation). Now take a sample, with replacement, of 130 and calculate the median. Repeat this 1000 times, and you'll have a "bootstrap sample" of medians for which we know that the true value is 98.6.
d) Based on your bootstrap sample, what is your estimate of the probability that a sample median will be as extreme OR MORE than the observed value of your sample median (in step b)?
e) Based on your answer to (d), would you reject the null hypothesis?

3) Suppose you are shown a suspicious looking coin, and asked to determine whether it is "fair". Because your time is limited, you decide that you will flip the coin 10 times. If it lands on heads 0, 1, 9, or 10 times, you will declare it "unfair". Otherwise, you will conclude that there is no evidence to call it unfair.
    a) What is the probability that you will declare a fair coin unfair? (i.e. What's the significance level?)
    b) Suppose that the reality is that the coin is unfair. In fact, assume that for this coin, p = .55. For this coin, what is the probability that your procedure will correctly identify it as unfair? This is the power at p = .55.
    c) Redo part (b), but now assume p = .60. Note that the power increases as p gets further from the null hypothesis value of .50.
    d) Suppose that p = .55, but now you change your procedure. You flip the coin 100 times, and you declare the coin unfair if it lands heads 10% or fewer, or 90% or more. Now what's the probability that you will correctly declare the coin unfair? Note that the power increases as the sample size increases.
(Hint: if X is a binomial random variable with parameters n and p, then the R command dbinom(x,n,p) gives you P(X = x).)
     e) Back to the old procedure of flipping the coin 10 times. Now we will declare the coin unfair if it lands heads 0,1,2,8,9, or 10. What is the significance level now?
    b) What is the power for this new procedure if we assume that the truth is that p = .55. How has it changed from (b)? What can you say about the relation between power and significance level?

4) A special managed care programwas implemented at 9 randomly chosen hospitals in an HMO system. At 7 of those hospitals, the average cost per patient decreased over the previous year. Is this evidence that the managed care program saves money?
a) Let X represent the number of hospitals out of the 9 that spend less money per patient. Before the study was conducted, this was a random variable. According to the null hypothesis, what value should we assign to p = the probability that a hospital will spend less money per patient.
b) What's the alternative hypothesis?
c) Let X be your test statistic. What's the sampling distribution of X, according to the null hypothesis?
d) The observed value of X is 7. What's the p-value?
e) If you use a significance level of 5%, do you reject the null hypothesis?

5) A study published in 1982 in the Journal of Epidimiology (Morton, et. al.) examined the children of workers at a battery factory. These workers were exposed to lead while at work, and there was concern that lead dust could be brought home and infect the workers' children. Once ingested, lead gets into the bloodstream and the body cannot not remove it. Therefore, over time, the level of lead accumulates. This is particularly dangerous in children, because excessive lead levels cause developmental problems.

The file lead.dat contains the blood lead levels for 33 children of workers at this factory. (Lead levels are measuerd in units of micrograms of lead per deciliter of blood.) They are labeled "Exposed". The other column, labeled "Control" consists of the blood lead levels for 33 "matched" children. These were children of the same age, living in the same neighborhood, but whose parents did not work around lead. So for example, the 4th child in the Exposed column is the same age and lives in the same neighborhood as the 4th child in the Control column.

a) Without looking at the data, what shape do you think the distribution of Exposed children's lead levels will look like? Explain why.
b) Look at the distributions of the Exposed and Control lead scores. What differences do you see in the shapes? Do you see evidence that the Exposed children differ from the Control children?
c) Most toxicologists believe that lead levels over 50 micrograms per deciliter require medical treatment, and levels over 60 require immediate hospitilization. How does this information affect your comparison of the two groups?
d) Create a new variable called "diff" that is equal to the Exposed levels minus the Control levels. Describe the distribution of this variable.
e) What does the variable diff tell you about the difference between Exposed and Control children?
f) Perform the appropriate hypothesis test to test whether the mean of diff is 0. State the null and alternative hypotheses, and your conclusion. Also state any assumptions that you made regarding the population and the sample.
g) Why did the experimenters choose matches from the same neighborhood? Of the same age?