*Introduction to Statistical Methods for the Life and Health Sciences*

http://www.stat.ucla.edu/~dinov/ |

** **

**GRAPHICAL AND NUMERICAL
DESCRIPTIONS OF DATA**

** **

**Objectives:**

(1) summarizing datasets graphically and numerically

(2) learn some new STATA commands

**STATA commands that you
will find helpful today and in the future:**

** **

**graph** *varname1 , *

**dotplot** *varname1*

**tabulate** *varname1*

**chist** *varname1*

**graph** *varname1***, bin(#)**

**stem** *varname1*

**summarize** *varname1* *varname2*, **detail**

**infile ***varname1 varname2 ***using ***filename.raw*

**use** *filename.dta*

Let’s look again at the
distribution of blood pressures before and after taking the drug captopril as a
blood-pressure treatment (15 patients were given the same dose of the
drug). Use the data from http://www.stat.ucla.edu/~rgould/datasets/bloodpressure.raw

Recall that there are two
variables in this dataset. The first
variable represents the systolic blood pressure before the drug (*varname*
= before), the second variable is the systolic blood pressure after the drug (*varname*
= after).

1. Would you classify these variables as quantitative or
qualitative?

Let’s look at the
distribution of each variable in graphical terms using dotplots.

Type

**dotplot** before after

The two groups appear
side-by-side. If two observations have
the same values, then STATA places the dots beside each other.

2. What are the smallest and
largest values in each distribution?
What would you guestimate for the sample average and median of each
distribution?

3. Does the dotplot provide evidence that the drug is
effective? Why?

4. Now try stem and leaf plots.

Type

**stem** before

Let’s describe the
distribution of each variable in numeric terms. To compute measures of center and spread, type

**summarize** before after**, detail**

** **

4. What are the median and the sample average of each
distribution? Were your guestimates
close?

Now use graphical and
numerical techniques for describing the distribution of the quantitative
variables in http://www.stat.ucla.edu/~rgould/datasets/m12s00.dta

5. Identify each variable as either quantitative or qualitative.

For each quantitative
variable, create a histogram, dotplot, and a stem and leaf plot. See what happens when you change the number
of intervals in the histogram.

Type

**graph** *varname1***, bin(**5**)** then
change 5 to 10, 11, 12, 13, 14, 15

6. Describe each distribution in terms of symmetry, number of modes,
and whether or not you think there are any outliers.

7. What are the mode, median, and sample average for the
distribution of each variable. On the
basis of this numeric information, describe the shape of the distribution. To compute the mode, type

**tabulate ***varname1*

8. The distribution of “height” looks a little unusual. What might explain the shape of this
distribution?

You may want to try some additional examples.

And compare your work to the template solution (this is not a unique solution, just a template)