Homework

Homework 9

    Due Wednesday, March 17 (Note the change of day!)

Solutions (pdf)  page1 page 2 page3 page 4

1) State the appropriate null and alternative hypotheses:
a) A dual X-ray absorptiometry (DXA) scanner is used to measure bone mineral density for people who may be at risk for osteoporosis. To be sure that the measurements are accurate, an object called a "phantom" that has known mean mineral density of 1.4 grams per square centimere is measured.  This phantom is scanned 10 times.  After trying this, check the answer here.
b) Feedback from your customers shows that many think it takes too long to fill out the online order form for your products.  YOu redesign the form and survey a random sample of customers to determine whether or not they think that the new form is actually an improvement.  The response uses a five-point scale: -2 if the new form takes much less time than th eold form; -1 if the new form takes a little less time; 0 if the new form takes about the same time; +1 if the new form takes a litle more time; and +2 if th enew form takes much more time.

2)* In the past, the mean score of the seniors at South High on the American College Testing (ACT) college entrance examination has been 20.   This year a special preparation course is offered, and all 53 seniors planning to take the ACT test enroll in the course.  The mean of their 53 ACT scores is 22.1.  The principal believes that the new course has improved the students' ACT scores.
a) Assume that ACT scores vary normally with standard deviation 6.  Is the outcome xbar=22.1 good evidence that the population mean score is greater than 20?  State the null and alternative hypotheses, compute the test statistic and the p-value, and answer the question by interpreting your result.
b) The results are in any case inconclusive because of the design fo the study.  The effects of the course are confounded with any change from past years, such as other new courses or higher standards.  Briefly outline the design of a better study of the effect of the new course on ACT scores.

3) Radon is a colorless, ordorless gas that is naturally released by rocks and soils and may concentrate in tightly closed houses.  Because radon is slightly radioactivie, there is some concern that it may be a health hazard.  Radon detectors are sold to homewoners, but the detectors may be inaccurate.  University researchers placed 12 detectors in a chamber where they were exposed to 105 picocuries per liter of radon over 3 days.  Here are the readings given by the detectors:
91.9, 97.8, 111.4, 122.3, 105.4, 95.0, 103.8, 99.6, 96.6, 119.3, 104.8, 101.7

Assume (unrealistically) that you know that the standard deviation of readings for all detectors of this type is sigma=9.
a) Give a 95% confidence interval for the mean reading for this type of detector.
b) Is there significant evidence at the 5% level that the mean reading differs from the true value of 105?  State hypotheses and base a test on your confidence interval from (a).

4) A random sample of 10 one-bedroom apartments from your local newspaper has these monthly rents (dollars):
500, 650, 600, 505, 450, 550, 515, 495, 650, 395.  (Note: the data come from Indiana!)
Do these data give good reason to believe that the mean rent of all advertised apartments is greater than $500 per month?  State hypotheses, compute the appropriate statistic, find the p-value, state your conclusion.

5) The scores of four rommates on the Law School Admission Test (LSAT) are 628, 593, 455, 503.  Find the mean, the standard deviation, and the standard error of the mean.  Is it appropriate to calculate a confidence interval for these data?  Explain why or why not.

6)* Repeat (3), but now do NOT assume that the standard deviation is known.  State hypotheses, calculate a test statistic, find the p-value, state your conclusion.

7) I just spun a coin (as described in problem 6 of HW 8) 50 times and got 11 heads.  Is the probability of getting a head 50% when using this method?  Let's find out:
a) State the null and alternative hypotheses.
b) State an appropriate test statistic and find the observed value for my experiment.
c) Calculate the p-value of this observed test statistic.
d) What do you conclude?

8) The English mathematician John Kerrich tossed a coin 10000 times and obtained 5067 heads.  (He was a prisoner of war and had time on his hands).  
a) Is this significant evidence at the 5% level that the probability that Kerrich's coin comes up heads is not 0.5?
b) Use a 95% confidence interval to find the range of probabilities of heads that would not be rejected at the 5% level.

9) You want to estimate the proportion of students at your college or university who are employed for 10 or more hours per week while classes are in session.  You plan to present your results by a 95% confidence interval.  Using the guessed value p =.4, find the sampel size required if the interval is to have an approximate margin of error of 0.06.

10) The latest Pew Poll  shows that 33% of a sample of 1,898 registered voters are "committed to Bush" in the November elections.
a) Find an approximate 95% confidence interval for the proportion of ALL registered voters who are committed to Bush.
b) Kerry had 38% of the sample say they are committed to him.  Assume that 38% of the population is committed to Kerry.  Perform a statistical test at the 5% level to determine whether Bush's support exceeds Kerry's.


Homework 8 Solutions to *

Solutions:  page1 page2
Due Friday, March 5 -- the same day as MT2
1)8-4*
2) repeat 8-4 with a 90% confidence interval.
3) 8-6
4) 8-8
5)* a) First, do problem 8-3 this way:   Notice that the population standard deviation is unknown in this problem; but pretend that the population standard deviation is exactly equal to the sample standard deviation of 6000 students.  Use Table IV to find the 95% CI for the mean enrollment at all 2700 colleges.  
5b)  Now assume that we don't know the population's standard deviation, and are forced to use the sample standard deviation, s=6000, as an estimate.  Re-calculate the 95% CI using Table V.  Note that the largest possible degrees of freedom (other than infinity) is df=120.  With a computer we can calculate the last row of Table V to be:
DF
t_.25
t_.1
t_.05
t_.025
t_.10
224
.68
1.28
1.65
1.97
2.34

6) We all know (maybe!) that flipping a coin leads to a probability of heads of 50%.  But what about spinning a coin?  Do this:  take a quarter and spin it on a hard surface and wait for it to fall. Record whether its heads or tails.  Do this 25 times.
a) Let X represent the number of heads you could get.  If it's true that the probability of a head using this method is 50%, what's the expected value and standard deviation of X?
b) Let x* = X/n.  What's the expected value and standard error of p*, assuming p = .5? (Where p is the probability of heads.)
c) Find the probability that p* will be more than 1.96 standard errors away from the true probability of heads (assuming the true probability of heads is 50%).
d) Now consider the results from your experiment.  Find an approximate 95% confidence interval for the true probability of heads using your data.  (Hint: Can you assume that the distribution of p* is approximately normal? Why or why not?)

Homework 7  Solution to *

  HW Solutions: page1 page2 page3 page4

Due Friday, February 27
6-3,6-6,6-7,6-8(a-c only)6-10*,6-11
7-1,7-2,7-8,7-9
3)*  We've already discussed the fact that if X is a binomial random variable (with sample size n and probability of success p), then if certain conditions are met, we can approximate the probability distribution of X with the normal distribution using mean np and standard deviation sqrt(n*p*(1-p)).  This fact (which is just an example of the central limit theorem) is useful for  understanding surveys.  

What percent of Americans believe that fighting the Iraq war was the right decision?  If we're honest, we should admit we don't know.  A way of estimating this proportion is to take a survey.  Suppose, then, we take a simple random sample of n people and let X be the number in our sample who would agree with that the war was the right decision.   If our sample were taken with replacement, then X would have a binomial probability distribution (why? Make sure you know which condition of a binomial distribution fails if the sample is taken without replacement).   However, if the population size is more than 10 times the sample size, then for all practical purposes we can "pretend" that the sample was taken with replacement and the pdf of X is very close to being a binomial in which n = # of people sampled and p = proportion of the population who would agree with the statement in support of the war. So from now on we'll assume that our sample size is less than 21 million people.

Our goal is to estimate p.  Let p* = X/n.   This is also called the "sample proportion" ; the proportion in our sample who support the war.  (As opposed to p, which is the proportion in the population who support the war.)
a) Show that p* is an unbiased estimator of p.
b) Find the standard error of p*.
c) Compute the mean squared error for p*.
d) Some people prefer to use p^ = (X+2)/(n+4)  as an estimate of p.  Which estimate, p* or p^, do you think is best and why?  (Hint: you'll need to do some calculations.)
e) Suppose n = 1503.  Then the distribution of X is approximately normal (why?) and so is the distribution of p*.  Assume that the truth is that America is mostly in support of the war and p = .6.  In that case, what's the probabability that your survey will come to the wrong conclusion and have 50% or fewer in the sample who support the war?
f) In fact, the Pew foundation did this survey, and found that p* = .65.  They said that their error was "plus or minus 3%".   Suppose that the truth is that p = .65.  What's the probability that p* will be more than .65 + .03 OR less than .65 - .03?

Homework 6  Solutions to *'d

Due Friday, February 20
5-8, 5-9, 5-10, 5-12a,5-14,5-17*
2) * Do taller than average women prefer to marry taller than average husbands?  And, vice versa, do shorter than average women prefer shorter than average husbands?  A study done in England found that (i) wives' heights followed a N(1613.1, 70.3) distribution (measured in cm),
(ii) the husbands' heights followed a N(1733.2, 66.6) distribution and (iii) the correlation between husband's and wives' heights was 0.463.
a) Approximately what percent of English husbands are shorter than 1550 cm?  
b) What's the covariance between husbands' and wives' heights?
c) Suppose we pick out a husband/wife pair at random.  Let (X,Y) represent (husband's height, wife's height).  Let D = X-Y represent the difference in height between husband and wife.  What's the expected value of this difference?
d) What's the standard deviation of D?
e) What's the probability a husband will be shorter than his wife?
f) Suppose that X and Y were uncorrelated.  What, then, would be the probability that a husband will be shorter than his wife?
g) If the sign of the correlation were reversed and were instead -.463, this would mean that taller than average husbands (or wives) preferred shorter than average wives (or husbands).  If this were the case, then what would be the probability that a husband is shorter than his wife?
NOTE:  For this problems you need to know an important fact: any linear combination of normally distributed random variables is itself normally distributed.  Therefore, since X and Y both follow normal distributions, D also follows a normal distribution.

Homework 5

Solutions to *'d problems.

page1
page2 page3 page4 page5

Due Friday, February 13
1  A soda machine on campus is supposed to deposit 12 ounces of soft drink in a cup.  In fact, the amount it delivers is a random variable, call it X.  (So X is the number of ounces deposited by the machine into the cup.)  Study has shown that X has a N(12.1, .2) distribution, which is a little problematic because it means that some people will get less than the 12 ounces they pay for.
a) Find the probability that someone gets less than 12 ounces.
b) Suppose you've bought 7 sodas from that machine.  How many do you expect to have had less than 12 ounces?  What's the standard deviation (the give-or-take) on this number?
c) What's the probability that 3 or fewer will have less than 12 ounces?
d) The company sells the glass of soda for $1.  The cost to the company is $.06/ounce.  What's the expected profit the company makes off of each soda?

2*  Consider the same soda machine as above.  Over the course of a day, the machine sells 1000 sodas and everyone who buys a soda thinks they are getting 12 ounces.
a) What's the expected number of "defective" sodas sold in a day (assuming 1000 total sold)?
b) What's the standard deviation of this number?
c) Find the approximate probability that more than 350 bad sodas will be sold.  Be sure to demonstrate that the normal approximation will work here.
d) Find the approximate probability that more than 30% of the sodas will be bad.
e) If you found the exact probability, using the binomial distribution (with n = 1000, p = your answer to 1a)  you'd get
.7302.  (I used a computer to find this.)  How does your answer compare?
f) Suppose that only 100 sodas were sold.  Now find the approximate probability (using the normal approximation) that more than 25% of the sodas will be bad.  Compare this to the exact answer: .91775

4-20,4-25
5-1,5-2*,5-3,5-7

Homework 4

Solutions to *'d problems
Solutions to non-starred problems (pdf format) :  page 1, page2, page 3, page4, page 5, page 6
Due February 6, the same day as the midterm.

1* ) These questions explore a pdf called the geometric distribution.
1a)Suppose we flip a fair coin until the first head appears.  Let X represent the number of flips until the first head.  So, for example, if your sequence looks like TTH then X = 3.   Write a formula for the probability distribution function of X.
b) Suppose the coin is not fair but has some unknown probability of landing heads.  Call this probability p.  What's the probability distribution function of X?
c) Suppose we have a fair, six-sided die and we roll it until the first "6" appears.  Use your answer to (b) to find the probability that this happens after the 4th roll.

2) Suppose X is a discrete random variable and b is a constant value.  
a) Show that E(bX) = bE(X)
b) Show that Var(bX) = b2Var(X)

4-7, 4-11,4-14-a,4-17,4-18,4-19-(a,b,f)  [Note: the probability of f is important -- in what context have we used this 95% figure before and how does it apply here?  No need to answer -- just think about], 4-22, 4-23, 4-36, 4-37, 4-39,4-40*

Homework 3  Solutions to *  and others  (note: we're only human, and so if you think there is  a mistake in the solutions, there might be.  Ask!)

NOTE: A previous version of the homework solutions had mistakes for 3-25 and 3.46.  These were fixed Monday, FEb 2, 4:30pm.

Due Friday, Jan 30


*1) In class you were asked to write the fastest you've ever driven a car.  83 people responded.  The average speed was 100 mph and the standard deviation was 19 mph.  The 5-number summary was 30, 85, 100, 110, 165.  
a) Roughly how many people said that the fastest they've driven a car is between 81mph and 119 mph?
b) Would you say that this distribution is roughly symmetric or skewed?  Why?  If "skewed", then to the right or left?
c) How many standard deviations above average is the maximum score?  How many standard deviations below average is the minimum?

2) Suppose we have n data points, represented by the symbols x1, x2, ..., xn.   We now multiply each value by a constant,k,  to form a new data set:  y1 = k*x1,  y2 = k*x2, ....yn = k*xn.
a) Let xbar be the sample mean of the x's, and ybar represent the sample mean of the y's.  Show that ybar = k*xbar.
b) Show that the sample variance of y is equal to k2*(sample variance of x) and that the SD of the y's is k*(SD of the x's).
c) To convert miles-per-hour to kilometers-per-hour you multiple the miles-per-hour value by 1.609.   The class said that the average fastest speed it has driven a car is 100 mph and the standard deviation is 19mph.  Show how to use the rules in (a) and (b) to give the mean, variance, and standard deviation in kilometers per hour.

3) (a) Give an example of two mutually exclusive events that do not involve coins, dice, or anything in this week's homework.
(b) Give an exmaple of two independent events that do not involve coins, dice, or anything in this week's homework.

From the book:
3-15, 3-16, 3-17, 3-20, 3-21, 3-23, 3-25,3-45, *3-46

Homework 2 (Solutions to *.) 

Due Friday, January 23

Chapter 1, p. 21: 13,14,18   (from last week)

 1)* The Survey of Study Habits and Attitudes (SSHA) is  a psychological test that evluates college students' motivation, study habits, and attitudes towards school.  A selective private college gives the SSHA to a sample of 18 of its incoming first-year college women. Their scores are:
    154, 109, 137, 115, 152, 140, 154, 178, 101, 103, 126, 137, 165, 165, 129, 200
The college also administers the test to a sample of 20 first-year college men. Their scores are
    108, 140, 114, 91, 180, 115, 126, 92, 169, 146, 109, 132, 75, 88, 113, 151, 70, 115, 187, 104

NOTE: We might not have had time to discuss ste plots.  Read about how to make them here.  Or read here.
A "back-to-back" stemplot is the same as a stemplot, except the men's values will go out to the left (or right) and the women's in the other direction.  Another term for "stemplot" is "stem and leaf plot".

a) Make a back-back stemplot of the men's and women's scores. The overall shapes of the distributions are indistinct, as often happens when only a few observations are available.  Are there any outliers?  Describe the center and spread of each groups' scores.  (Usually, you'd be asked to describe the shape, but there are too few observations here to get a good sense of this.)
b) Compare the midpoints and ranges of the two distributions.  What is the most noticeable contrast between the femal and male scores?
c) Would the mean score for the men be greater than or equal to the median for the men? why?  Calculate the mean and median of the men and women to check.
d) find the 5 number of summaries of both groups.  Does the 1.5*IQR criterion flag any potential outliers?  Make side-by-side boxplots.
e) Write a brief comparison of the two groups. do women score higher than men?  Which of your descriptions show this? Which group of scores is more spread out when we ignore outliers? Which of your descriptions shows this most clearly?

Chapter 2: 2-15 (p. 39), 2-21 (p. 51), 2-34(p.65), 2-38, 2-39 (p. 66)

2)  Suppose X1, ...Xn are measurements taken on some variable, and that the standard deviation of these measurements is equal to the number s.  Now let Yi = Xi + a.  Show that the standard deviation of Y is also equal to s.  (Hint:  what's the average of Y?)

3)* True or false and explain:  suppose we looked at incomes from a large sample of people.  More than  half of the people would have incomes above average.

4) Consider the histogram of heights collected for this class:

histogram of heights


a) Would a boxplot be a good summary of these data?  Why or why not?
b) There are 87 heights in this data set.  Approximately what is the median height?
c) You are asked to report the typical height for this set of data.  Would you report the mean, the median, or neither?  Why? If you answered "neither", how would you answer this request?



Homework 1  (Solution to * )

  Solutions to others (pdf file_)

Due Friday, January 16

Chapter 1, p. 21: 13,14,18  NOTE: These are optional, since you may not have the book yet

4) The Public Health Service studied the effects of smoking on health, in a large sample of representative households.  For men and for women in each age group, those who had never smoked were on average somewhat healthier than the current smokers, but the current smokers were on average much healthier than those who had recently stopped smoking.  The lesson seems to be that you shouldn't start smoking, but once you've started, don't stop.  Comment.

5)* A study of young children found that those with more body fat tended to have more "controlling" mothers; the San Francisco Chjronicle (nov. 9, 1994) concluded that "Parents of Fat Kids Should Lighten Up."
    a) WAs this an observational study or a randomized controlled experiment?  Explain.
    b) Did the study find an association between mother's behavior and her child's level of body fat?
    c) If controlling behavior by the mother causes children to eat more, would that explain an association between controlling behavior by the mother and her child's level of body fat?
    d) Suppose there is a gene which causes obesity. Would that explain the association?
    e) Can you think o fanother way to explain the association?
    f) Do the data support the Chronicl's advice on child-rearing? Explain.

6) California is evaluating a new program to rehabilitate prisoners before their release; the object is to reduce the recidivism rate -- the percentage who will be back in prison within two years of release.  The program involves several months of "bott camp" -- military-style basic training with very strict discipline.  Admission to the program is voluntary.  According to a prison spokesman, "Those who complete boot camp are less likely to return to prison than other inmates."
    a) What is the treatment group in the prison spokesman's comparison? The control group?
    b) Is the prison spokesman's comparison based on an observational study or a randomized controlled experiment? Explain.
    c) True or false and explain: the data show that boot camp worked.


NOTE:  Number 7 will be postopned until next week.  So 4-6 are the only problems that should be done for this week.

7)* The Survey of Study Habits and Attitudes (SSHA) is  a psychological test that evluates college students' motivation, study habits, and attitudes towards school.  A selective private college gives the SSHA to a sample of 18 of its incoming first-year college women. Their scores are:
    154, 109, 137, 115, 152, 140, 154, 178, 101, 103, 126, 137, 165, 165, 129, 200
The college also administers the test to a sample of 20 first-year college men. Their scores are
    108, 140, 114, 91, 180, 115, 126, 92, 169, 146, 109, 132, 75, 88, 113, 151, 70, 115, 187, 104

NOTE: We might not have had time to discuss ste plots.  Read about how to make them here.  Or read here.
A "back-to-back" stemplot is the same as a stemplot, except the men's values will go out to the left (or right) and the women's in the other direction.  Another term for "stemplot" is "stem and leaf plot".

a) Make a back-back stemplot of the men's and women's scores. The overall shapes of the distributions are indistinct, as often happens when only a few observations are available.  Are there any outliers?  Describe the center and spread of each groups' scores.  (Usually, you'd be asked to describe the shape, but there are too few observations here to get a good sense of this.)
b) Compare the midpoints and ranges of the two distributions.  What is the most noticeable contrast between the femal and male scores?
c) Would the mean score for the men be greater than or equal to the median for the men? why?  Calculate the mean and median of the men and women to check.
d) find the 5 number of summaries of both groups.  Does the 1.5*IQR criterion flag any potential outliers?  Make side-by-side boxplots.
e) Write a brief comparison of the two groups. do women score higher than men?  Which of your descriptions show this? Which group of scores is more spread out when we ignore outliers? Which of your descriptions shows this most clearly?