Data are at the heart of this case study:
Before examining the data for information they may reveal,
first think about what the data might
look like and record your answers to these questions:
1.Is the level of lead a continuous, discrete,
or categorical variable?
2.Describe what you think the histogram of
lead levels for the Exposed group
will look like. (
For example, symmetric, right-skewed, or left-skewed).
Now download the data . Stata should start up automatically. Answer these questions and provide supporting evidence (graphs, summaries, etc.) as needed.
1. Describe the sample distributions for the Exposed, Control, and Dif
variables.
Are there outliers or gaps?
2. Would the mean or the median be a better choice for describing
the central
tendency of each of the distributions? Why?
3. Does there appear to be a relation between lead levels
in the Exposed and Control
groups?
4. Which group has the highest levels of lead? Which
graphic do you think best
displays this?
5. Typically, how much higher or lower would you say the
Exposed group's blood
lead level is than the Control groups?
6. What percentage of the observations in the Exposed group are higher
than any
observations in the Control group?
7. What percentage of the 33 pairs have higher lead levels in the Exposed
group than in the
Control group?
8. What's a typical difference between the lead levels of the
Exposed group and the
Control group?
9. Based on these numerical summaries and the graphs, what would
you conclude
about the effects of lead in the parents' workplace on the child's
blood lead level?