The DISTRIBUTION of a
variable is its "pattern of variation. A graphical representation of a
distribution can answer questions like: how many are large, how many are small,
how many fall between two numbers, what is the most common number?
A
histogram is a way of examining distributions.
It can quickly summarize an enormous
amount of information on a single variable and it makes use of your nautral ability to recognize patterns. Examples are on pages 30 & 31.
4. The Histogram's Properties
·
A histogram
represents data observations by AREA for different class intervals, not
height. The area of each
"block" is proportional to the number of observations in the class
interval. The total area MUST be 100%
·
It has class
intervals on its horizontal axis. This
axis must be scaled.
·
It does not require
a scale on its vertical axis. The
vertical axis is scaled in percent per unit of the horizontal
axis, and a scale is automatically imposed by the fact that the area of the
histogram must be 100%
·
We also need an "endpoint
convention" to be able to draw a histogram: if an observation falls on the
boundary between two class intervals, to which one should we associate it? The
two standard choices are always to include the left boundary and exclude the
right, except for the rightmost bin, or always to include the right boundary
and exclude the left, except for the leftmost bin.
i. Divide the data into CLASS INTERVALS.
ii. Determine what percentage of observations fall within each class interval.
iii. Determine the appropriate height by dividing the percentage in the class
interval by the width of the class interval.
iv. Draw the histogram.
Center: what is the "typical"
value?
Symmetry or skewness: are
the data evenly divided is there a tail? Are there bumps?
Spread: are the data near to
each other or far apart? Are there gaps?
Exceptions
("outliers"): are there points that don't fit the general
distribution?
Remark: histograms may be bell-shaped, but not necessarily. They can be all
kinds of shapes.