Introduction to Statistical Methods for the Life and Health Sciences
GRAPHICAL AND NUMERICAL DESCRIPTIONS OF DATA
(1) summarizing datasets graphically and numerically
(2) learn some new STATA commands
STATA commands that you will find helpful today and in the future:
graph varname1, box
graph varname1, bin(#)
summarize varname1 varname2, detail
infile varname1 varname2 using filename.raw
Let’s look again at the distribution of blood pressures before and after taking the drug captopril as a blood-pressure treatment (15 patients were given the same dose of the drug). Use the data from http://www.stat.ucla.edu/~rgould/datasets/bloodpressure.raw
Recall that there are two variables in this dataset. The first variable represents the systolic blood pressure before the drug (varname = before), the second variable is the systolic blood pressure after the drug (varname = after).
1. Would you classify these variables as quantitative or qualitative?
Let’s look at the distribution of each variable in graphical terms using dotplots.
dotplot before after
The two groups appear side-by-side. If two observations have the same values, then STATA places the dots beside each other.
2. What are the smallest and largest values in each distribution? What would you guestimate for the sample average and median of each distribution?
3. Does the dotplot provide evidence that the drug is effective? Why?
4. Now try stem and leaf plots.
Let’s describe the distribution of each variable in numeric terms. To compute measures of center and spread, type
summarize before after, detail
4. What are the median and the sample average of each distribution? Were your guestimates close?
Now use graphical and numerical techniques for describing the distribution of the quantitative variables in http://www.stat.ucla.edu/~rgould/datasets/m12s00.dta
5. Identify each variable as either quantitative or qualitative.
For each quantitative variable, create a histogram, dotplot, and a stem and leaf plot. See what happens when you change the number of intervals in the histogram.
graph varname1, bin(5) then change 5 to 10, 11, 12, 13, 14, 15
6. Describe each distribution in terms of symmetry, number of modes, and whether or not you think there are any outliers.
7. What are the mode, median, and sample average for the distribution of each variable. On the basis of this numeric information, describe the shape of the distribution. To compute the mode, type
8. The distribution of “height” looks a little unusual. What might explain the shape of this distribution?