Homework
Homework 9
Due Wednesday, March 17 (Note the change of day!)
Solutions (pdf) page1 page 2 page3 page 4
1) State the appropriate null and alternative hypotheses:
a) A dual X-ray absorptiometry (DXA) scanner is used to measure bone mineral
density for people who may be at risk for osteoporosis. To be sure that the
measurements are accurate, an object called a "phantom" that has known mean
mineral density of 1.4 grams per square centimere is measured. This
phantom is scanned 10 times. After trying
this, check the answer here.
b) Feedback from your customers shows that many think it takes too
long to fill out the online order form for your products. YOu redesign
the form and survey a random sample of customers to determine whether or
not they think that the new form is actually an improvement. The response
uses a five-point scale: -2 if the new form takes much less time than th eold
form; -1 if the new form takes a little less time; 0 if the new form takes
about the same time; +1 if the new form takes a litle more time; and +2 if
th enew form takes much more time.
2)* In the past, the mean score of the seniors at South High on the American
College Testing (ACT) college entrance examination has been 20. This
year a special preparation course is offered, and all 53 seniors planning
to take the ACT test enroll in the course. The mean of their 53 ACT
scores is 22.1. The principal believes that the new course has improved
the students' ACT scores.
a) Assume that ACT scores vary normally with standard deviation 6. Is
the outcome xbar=22.1 good evidence that the population mean score is greater
than 20? State the null and alternative hypotheses, compute the test
statistic and the p-value, and answer the question by interpreting your result.
b) The results are in any case inconclusive because of the design fo the
study. The effects of the course are confounded with any change from
past years, such as other new courses or higher standards. Briefly
outline the design of a better study of the effect of the new course on ACT
scores.
3) Radon is a colorless, ordorless gas that is naturally released by rocks
and soils and may concentrate in tightly closed houses. Because radon
is slightly radioactivie, there is some concern that it may be a health hazard.
Radon detectors are sold to homewoners, but the detectors may be inaccurate.
University researchers placed 12 detectors in a chamber where they
were exposed to 105 picocuries per liter of radon over 3 days. Here
are the readings given by the detectors:
91.9, 97.8, 111.4, 122.3, 105.4, 95.0, 103.8, 99.6, 96.6, 119.3, 104.8,
101.7
Assume (unrealistically) that you know that the standard deviation of
readings for all detectors of this type is sigma=9.
a) Give a 95% confidence interval for the mean reading for this type of
detector.
b) Is there significant evidence at the 5% level that the mean reading
differs from the true value of 105? State hypotheses and base a test
on your confidence interval from (a).
4) A random sample of 10 one-bedroom apartments from your local newspaper
has these monthly rents (dollars):
500, 650, 600, 505, 450, 550, 515, 495, 650, 395. (Note: the data
come from Indiana!)
Do these data give good reason to believe that the mean rent of all advertised
apartments is greater than $500 per month? State hypotheses, compute
the appropriate statistic, find the p-value, state your conclusion.
5) The scores of four rommates on the Law School Admission Test (LSAT)
are 628, 593, 455, 503. Find the mean, the standard deviation, and
the standard error of the mean. Is it appropriate to calculate a confidence
interval for these data? Explain why or why not.
6)* Repeat (3), but now do NOT assume that the standard deviation is known.
State hypotheses, calculate a test statistic, find the p-value, state
your conclusion.
7) I just spun a coin (as described in problem 6 of HW 8) 50 times and
got 11 heads. Is the probability of getting a head 50% when using
this method? Let's find out:
a) State the null and alternative hypotheses.
b) State an appropriate test statistic and find the observed value for
my experiment.
c) Calculate the p-value of this observed test statistic.
d) What do you conclude?
8) The English mathematician John Kerrich tossed a coin 10000 times and
obtained 5067 heads. (He was a prisoner of war and had time on his
hands).
a) Is this significant evidence at the 5% level that the probability that
Kerrich's coin comes up heads is not 0.5?
b) Use a 95% confidence interval to find the range of probabilities of
heads that would not be rejected at the 5% level.
9) You want to estimate the proportion of students at your college or
university who are employed for 10 or more hours per week while classes
are in session. You plan to present your results by a 95% confidence
interval. Using the guessed value p =.4, find the sampel size required
if the interval is to have an approximate margin of error of 0.06.
10) The latest Pew Poll
shows that 33% of a sample of 1,898 registered voters are "committed
to Bush" in the November elections.
a) Find an approximate 95% confidence interval for the proportion of ALL
registered voters who are committed to Bush.
b) Kerry had 38% of the sample say they are committed to him. Assume
that 38% of the population is committed to Kerry. Perform a statistical
test at the 5% level to determine whether Bush's support exceeds Kerry's.
Solutions: page1 page2
Due Friday, March 5 -- the same day as MT2
1)8-4*
2) repeat 8-4 with a 90% confidence interval.
3) 8-6
4) 8-8
5)* a) First, do problem 8-3 this way: Notice that the
population standard deviation is unknown in this problem; but pretend
that the population standard deviation is exactly equal to the sample
standard deviation of 6000 students. Use Table IV to find the 95%
CI for the mean enrollment at all 2700 colleges.
5b) Now assume that we don't know the population's standard deviation,
and are forced to use the sample standard deviation, s=6000, as an estimate.
Re-calculate the 95% CI using Table V. Note that the largest
possible degrees of freedom (other than infinity) is df=120. With
a computer we can calculate the last row of Table V to be:
DF
|
t_.25
|
t_.1
|
t_.05
|
t_.025
|
t_.10
|
224
|
.68
|
1.28
|
1.65
|
1.97
|
2.34
|
6) We all know (maybe!) that flipping a coin leads to a probability
of heads of 50%. But what about spinning a coin? Do this: take
a quarter and spin it on a hard surface and wait for it to fall. Record
whether its heads or tails. Do this 25 times.
a) Let X represent the number of heads you could get. If it's
true that the probability of a head using this method is 50%, what's the
expected value and standard deviation of X?
b) Let x* = X/n. What's the expected value and standard error
of p*, assuming p = .5? (Where p is the probability of heads.)
c) Find the probability that p* will be more than 1.96 standard errors
away from the true probability of heads (assuming the true probability of
heads is 50%).
d) Now consider the results from your experiment. Find an approximate
95% confidence interval for the true probability of heads using your data.
(Hint: Can you assume that the distribution of p* is approximately
normal? Why or why not?)
HW Solutions: page1 page2
page3 page4
Due Friday, February 27
6-3,6-6,6-7,6-8(a-c only)6-10*,6-11
7-1,7-2,7-8,7-9
3)* We've already discussed the fact that if X is a binomial
random variable (with sample size n and probability of success p), then
if certain conditions are met, we can approximate the probability distribution
of X with the normal distribution using mean np and standard deviation
sqrt(n*p*(1-p)). This fact (which is just an example of the central
limit theorem) is useful for understanding surveys.
What percent of Americans believe that fighting the Iraq war was
the right decision? If we're honest, we should admit we don't know.
A way of estimating this proportion is to take a survey. Suppose,
then, we take a simple random sample of n people and let X be the number
in our sample who would agree with that the war was the right decision.
If our sample were taken with replacement, then X would have a binomial
probability distribution (why? Make sure you know which condition of a binomial
distribution fails if the sample is taken without replacement). However,
if the population size is more than 10 times the sample size, then for all
practical purposes we can "pretend" that the sample was taken with replacement
and the pdf of X is very close to being a binomial in which n = # of people
sampled and p = proportion of the population who would agree with the statement
in support of the war. So from now on we'll assume that our sample size
is less than 21 million people.
Our goal is to estimate p. Let p* = X/n. This is also
called the "sample proportion" ; the proportion in our sample who support
the war. (As opposed to p, which is the proportion in the population
who support the war.)
a) Show that p* is an unbiased estimator of p.
b) Find the standard error of p*.
c) Compute the mean squared error for p*.
d) Some people prefer to use p^ = (X+2)/(n+4) as an estimate
of p. Which estimate, p* or p^, do you think is best and why? (Hint:
you'll need to do some calculations.)
e) Suppose n = 1503. Then the distribution of X is approximately
normal (why?) and so is the distribution of p*. Assume that the
truth is that America is mostly in support of the war and p = .6. In
that case, what's the probabability that your survey will come to the
wrong conclusion and have 50% or fewer in the sample who support the war?
f) In fact, the Pew foundation
did this survey, and found that p* = .65. They said that their
error was "plus or minus 3%". Suppose that the truth is that p
= .65. What's the probability that p* will be more than .65 + .03
OR less than .65 - .03?
Due Friday, February 20
5-8, 5-9, 5-10, 5-12a,5-14,5-17*
2) * Do taller than average women prefer to marry taller than average
husbands? And, vice versa, do shorter than average women prefer
shorter than average husbands? A study done in England found that
(i) wives' heights followed a N(1613.1, 70.3) distribution (measured in
cm),
(ii) the husbands' heights followed a N(1733.2, 66.6) distribution
and (iii) the correlation between husband's and wives' heights was 0.463.
a) Approximately what percent of English husbands are shorter than
1550 cm?
b) What's the covariance between husbands' and wives' heights?
c) Suppose we pick out a husband/wife pair at random. Let
(X,Y) represent (husband's height, wife's height). Let D = X-Y represent
the difference in height between husband and wife. What's the expected
value of this difference?
d) What's the standard deviation of D?
e) What's the probability a husband will be shorter than his wife?
f) Suppose that X and Y were uncorrelated. What, then, would
be the probability that a husband will be shorter than his wife?
g) If the sign of the correlation were reversed and were instead
-.463, this would mean that taller than average husbands (or wives) preferred
shorter than average wives (or husbands). If this were the case,
then what would be the probability that a husband is shorter than his wife?
NOTE: For this problems you need to know an important fact:
any linear combination of normally distributed random variables is itself
normally distributed. Therefore, since X and Y both follow normal
distributions, D also follows a normal distribution.
Homework 5
Solutions
to *'d problems.
page1 page2 page3
page4 page5
Due Friday, February 13
1 A soda machine on campus is supposed to deposit 12
ounces of soft drink in a cup. In fact, the amount it delivers
is a random variable, call it X. (So X is the number of ounces
deposited by the machine into the cup.) Study has shown that X has
a N(12.1, .2) distribution, which is a little problematic because it
means that some people will get less than the 12 ounces they pay for.
a) Find the probability that someone gets less than 12 ounces.
b) Suppose you've bought 7 sodas from that machine. How
many do you expect to have had less than 12 ounces? What's the
standard deviation (the give-or-take) on this number?
c) What's the probability that 3 or fewer will have less than
12 ounces?
d) The company sells the glass of soda for $1. The cost
to the company is $.06/ounce. What's the expected profit the company
makes off of each soda?
2* Consider the same soda machine as above. Over
the course of a day, the machine sells 1000 sodas and everyone who
buys a soda thinks they are getting 12 ounces.
a) What's the expected number of "defective" sodas sold in
a day (assuming 1000 total sold)?
b) What's the standard deviation of this number?
c) Find the approximate probability that more than 350 bad
sodas will be sold. Be sure to demonstrate that the normal approximation
will work here.
d) Find the approximate probability that more than 30% of the
sodas will be bad.
e) If you found the exact probability, using the binomial distribution
(with n = 1000, p = your answer to 1a) you'd get
.7302. (I used a computer to find this.) How does
your answer compare?
f) Suppose that only 100 sodas were sold. Now find the
approximate probability (using the normal approximation) that more
than 25% of the sodas will be bad. Compare this to the exact answer:
.91775
4-20,4-25
5-1,5-2*,5-3,5-7
Homework 4
Solutions to
*'d problems
Solutions to non-starred problems (pdf format) : page
1, page2,
page 3,
page4,
page 5,
page 6
Due February 6, the same day as the midterm.
1* ) These questions explore a pdf called the geometric
distribution.
1a)Suppose we flip a fair coin until the first head appears.
Let X represent the number of flips until the first head. So,
for example, if your sequence looks like TTH then X = 3.
Write a formula for the probability distribution function of X.
b) Suppose the coin is not fair but has some unknown
probability of landing heads. Call this probability p. What's
the probability distribution function of X?
c) Suppose we have a fair, six-sided die and we roll
it until the first "6" appears. Use your answer to (b) to
find the probability that this happens after the 4th roll.
2) Suppose X is a discrete random variable and b is a
constant value.
a) Show that E(bX) = bE(X)
b) Show that Var(bX) = b2Var(X)
4-7, 4-11,4-14-a,4-17,4-18,4-19-(a,b,f) [Note: the
probability of f is important -- in what context have we used this
95% figure before and how does it apply here? No need to answer
-- just think about], 4-22, 4-23, 4-36, 4-37, 4-39,4-40*
Homework 3 Solutions to
* and others
(note: we're only human, and so if you think there is a
mistake in the solutions, there might be. Ask!)
NOTE: A previous version of the homework solutions
had mistakes for 3-25 and 3.46. These were fixed Monday, FEb
2, 4:30pm.
Due Friday, Jan 30
*1) In class you were asked to write the fastest you've
ever driven a car. 83 people responded. The average speed
was 100 mph and the standard deviation was 19 mph. The 5-number
summary was 30, 85, 100, 110, 165.
a) Roughly how many people said that the fastest they've
driven a car is between 81mph and 119 mph?
b) Would you say that this distribution is roughly
symmetric or skewed? Why? If "skewed", then to the
right or left?
c) How many standard deviations above average is the
maximum score? How many standard deviations below average
is the minimum?
2) Suppose we have n data points, represented by the
symbols x1, x2, ..., xn. We now multiply each value by
a constant,k, to form a new data set: y1 = k*x1, y2
= k*x2, ....yn = k*xn.
a) Let xbar be the sample mean of the x's, and ybar
represent the sample mean of the y's. Show that ybar = k*xbar.
b) Show that the sample variance of y is equal to k2*(sample
variance of x) and that the SD of the y's is k*(SD of the x's).
c) To convert miles-per-hour to kilometers-per-hour
you multiple the miles-per-hour value by 1.609. The class
said that the average fastest speed it has driven a car is 100 mph
and the standard deviation is 19mph. Show how to use the rules
in (a) and (b) to give the mean, variance, and standard deviation in
kilometers per hour.
3) (a) Give an example of two mutually exclusive events
that do not involve coins, dice, or anything in this week's homework.
(b) Give an exmaple of two independent events that
do not involve coins, dice, or anything in this week's homework.
From the book:
3-15, 3-16, 3-17, 3-20, 3-21, 3-23, 3-25,3-45, *3-46
Due Friday,
January 23
Chapter 1, p. 21: 13,14,18 (from last week)
1)* The Survey of Study Habits and Attitudes
(SSHA) is a psychological test that evluates college students'
motivation, study habits, and attitudes towards school. A
selective private college gives the SSHA to a sample of 18 of
its incoming first-year college women. Their scores are:
154, 109, 137, 115, 152, 140,
154, 178, 101, 103, 126, 137, 165, 165, 129, 200
The college also administers the test to a sample
of 20 first-year college men. Their scores are
108, 140, 114, 91, 180, 115,
126, 92, 169, 146, 109, 132, 75, 88, 113, 151, 70, 115, 187, 104
NOTE: We might not have had time to discuss ste
plots. Read
about how to make them here. Or read here.
A "back-to-back" stemplot is the same as a stemplot,
except the men's values will go out to the left (or right) and
the women's in the other direction. Another term for "stemplot"
is "stem and leaf plot".
a) Make a back-back stemplot of the men's and
women's scores. The overall shapes of the distributions are
indistinct, as often happens when only a few observations are available.
Are there any outliers? Describe the center and spread
of each groups' scores. (Usually, you'd be asked to describe
the shape, but there are too few observations here to get a good sense
of this.)
b) Compare the midpoints and ranges of the two
distributions. What is the most noticeable contrast between
the femal and male scores?
c) Would the mean score for the men be greater
than or equal to the median for the men? why? Calculate the
mean and median of the men and women to check.
d) find the 5 number of summaries of both groups.
Does the 1.5*IQR criterion flag any potential outliers?
Make side-by-side boxplots.
e) Write a brief comparison of the two groups.
do women score higher than men? Which of your descriptions
show this? Which group of scores is more spread out when we ignore
outliers? Which of your descriptions shows this most clearly?
Chapter 2: 2-15 (p. 39), 2-21 (p. 51), 2-34(p.65),
2-38, 2-39 (p. 66)
2) Suppose X1, ...Xn are measurements taken
on some variable, and that the standard deviation of these measurements
is equal to the number s. Now let Yi = Xi + a. Show
that the standard deviation of Y is also equal to s. (Hint:
what's the average of Y?)
3)* True or false and explain: suppose we
looked at incomes from a large sample of people. More than
half of the people would have incomes above average.
4) Consider the histogram of heights collected for
this class:
a) Would a boxplot be a good summary of these data?
Why or why not?
b) There are 87 heights in this data set. Approximately
what is the median height?
c) You are asked to report the typical height for
this set of data. Would you report the mean, the median,
or neither? Why? If you answered "neither", how would you answer
this request?
Due Friday, January 16
Chapter 1, p. 21: 13,14,18 NOTE: These are
optional, since you may not have the book yet
4) The Public Health Service studied the effects
of smoking on health, in a large sample of representative households.
For men and for women in each age group, those who had never
smoked were on average somewhat healthier than the current smokers,
but the current smokers were on average much healthier than those
who had recently stopped smoking. The lesson seems to be
that you shouldn't start smoking, but once you've started, don't stop.
Comment.
5)* A study of young children found that those
with more body fat tended to have more "controlling" mothers; the
San Francisco Chjronicle (nov. 9, 1994) concluded that "Parents
of Fat Kids Should Lighten Up."
a) WAs this an observational
study or a randomized controlled experiment? Explain.
b) Did the study find an association
between mother's behavior and her child's level of body fat?
c) If controlling behavior
by the mother causes children to eat more, would that explain
an association between controlling behavior by the mother and
her child's level of body fat?
d) Suppose there is a gene
which causes obesity. Would that explain the association?
e) Can you think o fanother
way to explain the association?
f) Do the data support the
Chronicl's advice on child-rearing? Explain.
6) California is evaluating a new program to rehabilitate
prisoners before their release; the object is to reduce the
recidivism rate -- the percentage who will be back in prison
within two years of release. The program involves several
months of "bott camp" -- military-style basic training with very
strict discipline. Admission to the program is voluntary. According
to a prison spokesman, "Those who complete boot camp are less likely
to return to prison than other inmates."
a) What is the treatment group
in the prison spokesman's comparison? The control group?
b) Is the prison spokesman's
comparison based on an observational study or a randomized controlled
experiment? Explain.
c) True or false and explain:
the data show that boot camp worked.
NOTE: Number 7 will be postopned until next
week. So 4-6 are the only problems that should be done for
this week.
7)* The Survey of Study Habits and Attitudes (SSHA)
is a psychological test that evluates college students'
motivation, study habits, and attitudes towards school. A
selective private college gives the SSHA to a sample of 18 of
its incoming first-year college women. Their scores are:
154, 109, 137, 115, 152, 140,
154, 178, 101, 103, 126, 137, 165, 165, 129, 200
The college also administers the test to a sample
of 20 first-year college men. Their scores are
108, 140, 114, 91, 180, 115,
126, 92, 169, 146, 109, 132, 75, 88, 113, 151, 70, 115, 187, 104
NOTE: We might not have had time to discuss ste
plots. Read
about how to make them here. Or read here.
A "back-to-back" stemplot is the same as a stemplot,
except the men's values will go out to the left (or right) and
the women's in the other direction. Another term for "stemplot"
is "stem and leaf plot".
a) Make a back-back stemplot of the men's and
women's scores. The overall shapes of the distributions are
indistinct, as often happens when only a few observations are available.
Are there any outliers? Describe the center and spread
of each groups' scores. (Usually, you'd be asked to describe
the shape, but there are too few observations here to get a good sense
of this.)
b) Compare the midpoints and ranges of the two
distributions. What is the most noticeable contrast between
the femal and male scores?
c) Would the mean score for the men be greater
than or equal to the median for the men? why? Calculate the
mean and median of the men and women to check.
d) find the 5 number of summaries of both groups.
Does the 1.5*IQR criterion flag any potential outliers?
Make side-by-side boxplots.
e) Write a brief comparison of the two groups.
do women score higher than men? Which of your descriptions
show this? Which group of scores is more spread out when we ignore
outliers? Which of your descriptions shows this most clearly?