Homework 2
Due Oct. 15

1.  A hypothetical  pharmaceutical company claims a new drug lowers blood pressure.  The drug was given to a small number of subjects, whose blood pressure was measured 15 minutes before and 15 minutes after taking the drug.  The data are in two columns; the first is the "after" systolic blood pressure, and the second is the 'before' systolic blood pressure.
a) Make back-to-back stem and leaf plots (by hand).  Does one group seem to be higher than the other?  Are all observations higher or only a few? Are there outliers?  Are the distributions symmetric?  Are they unimodal?
b) Using Data Desk, make histograms of both the before and the after variable.  By highlighting one bin with your cursor, the corresponding data will be highlighted in the other variable.  Use this feature to determine whether people who have high blood pressure before also tend to have high blood pressure after.
c) Using Data Desk, make a scatterplot with the "Before" values on the x axis, and the "After" on the y.   Do people with high blood pressure before tend to have high blood pressure after?  (NOTE:  click on "After" to make it the y variable (a "Y" will be superimposed over the icon representing this variable.)  Then, hold down the shift key and click on the "Before" variable to make it the x variable.
d) Based on what you have seen so far, do you think, the drug is effective?  Explain.
e) Because each "before" and "after" observations comes from the same person, these data are called "paired."  For this reason, it is better to look at differences in the before and after variable, rather than treat the variables as two separate groups.  Make a variable called "Difference" where "Difference" = "Before" - "After".  To do this, select Manipulate > Transform > New Derived Variable from the menu. Name the variable "Difference" when prompted.   In the next window that appears, type before - difference

A new icon will appear called "Difference".  Click on this once to make it the "Y" variable.  Then, under the menu bar, choose Data > Evaluate Derived Variable
Yet another icon will appear, this one also named "Difference" but if you look closely, it has a slightly different appearance.  This contains the values.  Click on it, and make a histogram of the Difference variable.

Does the drug look effective.  What percentage of the people showed an improvement (a lower blood pressure)?

2. Download the ClassData.  You should read about it first, and maybe download a copy of the survey itself.
a) Calculate the average, SD, median, interquartile range for height and weight.  Which measure of center, average or median, do you think best describes each variable.  (Consult a histogram.)
b) Make a boxplot of height and weight.  Are there outliers?  Are the distributions symmetric?  Between what two values do the middle 50% of the class lie?

3. Again, make a histogram of the Interval variable of the Old Faithful data.  Remember, the interval variable is the number of minutes between eruptions of the Old Faithful geyser.  (Look up Old Faithful in ActivStats to see a video of the geyser.)   What numerical summary statistic is an appropriate summary of the time until eruptions?  Why?  For which ever choice you make, answer these questions:
a) What's the most you'll be wrong?  (In other words, how far off might your answer be?)
b)  Roughly speaking, what is a "typical" error?
c) Describe the distribution in words.  (Refer to Number 1a for some things to look for in your description.)

4)  From SMT: p. 52, number 3
5)- 7) p. 75, number 11,12,13
8) Here's a standard deviation contest.  From the digits 0,1,...9, you must pick four (repeats allowed) such that
a) they have the smallest possible SD
b) they have the largest possible SD
c) Is your answer to (a) unique?
d) Is your answer to (b) unique?