1. A hypothetical pharmaceutical company claims a new drug
lowers blood pressure. The drug was given to a small number of subjects,
whose blood pressure was measured 15 minutes before and 15 minutes after
taking the drug. The data
are in two columns; the first is the "after" systolic blood pressure, and
the second is the 'before' systolic blood pressure.
a) Make back-to-back stem and leaf plots (by hand). Does one
group seem to be higher than the other? Are all observations higher
or only a few? Are there outliers? Are the distributions symmetric?
Are they unimodal?
b) Using Data Desk, make histograms of both the before and the after
variable. By highlighting one bin with your cursor, the corresponding
data will be highlighted in the other variable. Use this feature
to determine whether people who have high blood pressure before also tend
to have high blood pressure after.
c) Using Data Desk, make a scatterplot with the "Before" values on
the x axis, and the "After" on the y. Do people with high blood
pressure before tend to have high blood pressure after? (NOTE:
click on "After" to make it the y variable (a "Y" will be superimposed
over the icon representing this variable.) Then, hold down the shift
key and click on the "Before" variable to make it the x variable.
d) Based on what you have seen so far, do you think, the drug is effective?
Explain.
e) Because each "before" and "after" observations comes from the same
person, these data are called "paired." For this reason, it is better
to look at differences in the before and after variable, rather than treat
the variables as two separate groups. Make a variable called "Difference"
where "Difference" = "Before" - "After". To do this, select Manipulate
> Transform > New Derived Variable from the menu. Name the variable
"Difference" when prompted. In the next window that appears,
type before - difference
A new icon will appear called "Difference". Click on this once
to make it the "Y" variable. Then, under the menu bar, choose Data
> Evaluate Derived Variable
Yet another icon will appear, this one also named "Difference" but
if you look closely, it has a slightly different appearance. This
contains the values. Click on it, and make a histogram of the Difference
variable.
Does the drug look effective. What percentage of the people showed an improvement (a lower blood pressure)?
2. Download the ClassData.
You should read
about it first, and maybe download a copy of the survey
itself.
a) Calculate the average, SD, median, interquartile range for height
and weight. Which measure of center, average or median, do you think
best describes each variable. (Consult a histogram.)
b) Make a boxplot of height and weight. Are there outliers?
Are the distributions symmetric? Between what two values do the middle
50% of the class lie?
3. Again, make a histogram of the Interval variable of the Old Faithful
data. Remember, the interval variable is the number of minutes between
eruptions of the Old Faithful geyser. (Look up Old Faithful in ActivStats
to see a video of the geyser.) What numerical summary statistic
is an appropriate summary of the time until eruptions? Why?
For which ever choice you make, answer these questions:
a) What's the most you'll be wrong? (In other words, how far
off might your answer be?)
b) Roughly speaking, what is a "typical" error?
c) Describe the distribution in words. (Refer to Number 1a for
some things to look for in your description.)
4) From SMT: p. 52, number 3
5)- 7) p. 75, number 11,12,13
8) Here's a standard deviation contest. From the digits 0,1,...9,
you must pick four (repeats allowed) such that
a) they have the smallest possible SD
b) they have the largest possible SD
c) Is your answer to (a) unique?
d) Is your answer to (b) unique?