Introduction to Statistical Methods for the Life and Health Sciences
|
GRAPHICAL AND NUMERICAL
DESCRIPTIONS OF DATA
Objectives:
(1) summarizing datasets graphically and numerically
(2) learn some new STATA commands
STATA commands that you
will find helpful today and in the future:
graph varname1, box
dotplot varname1
tabulate varname1
chist varname1
graph varname1, bin(#)
stem varname1
summarize varname1 varname2, detail
infile varname1 varname2 using filename.raw
use filename.dta
Let’s look again at the
distribution of blood pressures before and after taking the drug captopril as a
blood-pressure treatment (15 patients were given the same dose of the
drug). Use the data from http://www.stat.ucla.edu/~rgould/datasets/bloodpressure.raw
Recall that there are two
variables in this dataset. The first
variable represents the systolic blood pressure before the drug (varname
= before), the second variable is the systolic blood pressure after the drug (varname
= after).
1. Would you classify these variables as quantitative or
qualitative?
Let’s look at the
distribution of each variable in graphical terms using dotplots.
Type
dotplot before after
The two groups appear
side-by-side. If two observations have
the same values, then STATA places the dots beside each other.
2. What are the smallest and
largest values in each distribution?
What would you guestimate for the sample average and median of each
distribution?
3. Does the dotplot provide evidence that the drug is
effective? Why?
4. Now try stem and leaf plots.
Type
stem before
Let’s describe the
distribution of each variable in numeric terms. To compute measures of center and spread, type
summarize before after, detail
4. What are the median and the sample average of each
distribution? Were your guestimates
close?
Now use graphical and
numerical techniques for describing the distribution of the quantitative
variables in http://www.stat.ucla.edu/~rgould/datasets/m12s00.dta
5. Identify each variable as either quantitative or qualitative.
For each quantitative
variable, create a histogram, dotplot, and a stem and leaf plot. See what happens when you change the number
of intervals in the histogram.
Type
graph varname1, bin(5) then
change 5 to 10, 11, 12, 13, 14, 15
6. Describe each distribution in terms of symmetry, number of modes,
and whether or not you think there are any outliers.
7. What are the mode, median, and sample average for the
distribution of each variable. On the
basis of this numeric information, describe the shape of the distribution. To compute the mode, type
tabulate varname1
8. The distribution of “height” looks a little unusual. What might explain the shape of this
distribution?