Statistics 10              Lecture 3          Graphical Summaries

1.      Some Data Vocabulary
Variables -- characteristics of a person, place, thing.  Qualitative/Quantitative -- is the characteristic a verbal descriptor or a number.  Discrete/Continuous -- does the numerical value come in compact units like 1,2, 3 or 2.7111174? 

2.      Summaries: Finding Patterns in Data

The DISTRIBUTION of a variable is its "pattern of variation. A graphical representation of a distribution can answer questions like: how many are large, how many are small, how many fall between two numbers, what is the most common number?

3.      What is a histogram supposed to do for you?

            A histogram is a way of examining distributions.  It can quickly summarize an enormous

amount of information on a single variable and it makes use of your nautral ability to recognize patterns.  Examples are on pages 30 & 31.

 

4.       The Histogram's Properties

·         A histogram represents data observations by AREA for different class intervals, not height.  The area of each "block" is proportional to the number of observations in the class interval.  The total area MUST be 100%

·         It has class intervals on its horizontal axis.  This axis must be scaled.

·         It does not require a scale on its vertical axis.  The vertical axis is scaled in percent per unit of the horizontal axis, and a scale is automatically imposed by the fact that the area of the histogram must be 100%

·        We also need an "endpoint convention" to be able to draw a histogram: if an observation falls on the boundary between two class intervals, to which one should we associate it? The two standard choices are always to include the left boundary and exclude the right, except for the rightmost bin, or always to include the right boundary and exclude the left, except for the leftmost bin.

5.       Building the Histogram

 

i. Divide the data into CLASS INTERVALS.
ii. Determine what percentage of observations fall within each class interval.
iii. Determine the appropriate height by dividing the percentage in the class interval by the width of the class interval.
iv. Draw the histogram.

6.      Be aware of a histogram's characteristics

Center: what is the "typical" value?
            Symmetry or skewness: are the data evenly divided is there a tail? Are there bumps?        
            Spread: are the data near to each other or far apart? Are there gaps?
            Exceptions ("outliers"): are there points that don't fit the general distribution?
Remark: histograms may be bell-shaped, but not necessarily. They can be all kinds of shapes.