Statistics 10/50
Lecture 4


DESCRIBING DATA NUMERICALLY (4.1-4.3)

A. Overview

The usual two numbers summarizing a distribution are the "center"[the "typical" value] and the "spread" [how close or far the data are to each other].

B. "Center"

  1. The Mean

    a. The most commonly generated measure of "center" is the MEAN, often called the AVERAGE.

    b. The mean is denoted as

      _
      x  (read as "x-bar").
    
    c. The mean is computed as follows: given a list of n numbers
    	    x1, x2, ... , xn, the mean is
    _ x1 + x2 + ... + xn sum xi
    x = -------------------- = --------
    n n
    d. Example: A First-year Law Student enrolls in 5 courses, these are her grades at the end of the year:

    93, 90, 81, 80, 77

    The sum of these is 421 (sum xi from above)

    n = 5 (she took 5 courses)

    the mean (x-bar) is 84.2

  2. The Median
    a. The median is the "middle point" of a list: half of the data are larger than (or equal to) the median, and half of the data are smaller than (or equal to) the median.

    b. The median is computed as follows:

    given a list of n numbers x1, x2, ... , xn,

    sort all the numbers

    and pick the middle number from the list.

    If the list has an even number of elements, take the average of the two middle numbers.

    c. Example:

    The sorted law school grades: 77, 80, 81, 90, 93
    The median (M) of this list is 81
    If she had taken SIX classes instead of FIVE:

    77, 80, 81, 90, 93, 97

    Take the average of the middle two numbers (81 and 90), that is, 81+90 divided by 2 or 85.5

  3. Remarks
    a. The mean is the "balancing point" of a histogram; the median simply divides the data in half.

    b. For a symmetric distribution, the mean equals the median.

    c. The mean is sensitive to outliers and long tails! The median is not:

    e.g., the list "77, 80, 81, 90, 93" has mean 84.2 and median 81;

    if the list were changed to "17, 80, 81, 90, 93", the mean would be 72.2, but the median would still be 81.

    d. It is not necessary to know HOW MANY numbers are in a list, only the RELATIVE FREQUENCY of the values; e.g., if she had 10 classes with scores: "77,77,80,80,81,81,90,90,93,93" the mean is still 84.2. As long as the scores in list maintain the relative frequencies (in this example: 20% x1's, 20% x2's, 20% x3's and so forth) the mean will be unchanged.

C. More practice with the mean & median

  1. Numerical Examples
        1.  Given the list 1,2,3,4,5,6,7,8,9,10,11,12:
    
            a.  n=12, sum=78
            b.  median = 6.5
            c.  mean = 78/12 = 6.5
    
    
        2.  Given the list 1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,
                           10,10,11,11,12,12
    
            a.  n=24, sum= 156
            b.  median = 6.5
    	c.  mean = 156/24=6.5
    
        NOTE: the mean and median stayed the same.  What matters
              here is the relative frequency of a value.  
         
  2. Properties of center
    1. Suppose we are given a list of numbers x1, x2, ... , xn

    2. Suppose we construct a new list by adding a constant "a" to each number in the old list.

    a. Picture: shifted histogram.
    b. The median and the mean both go up by a.

    3. Suppose we construct a new list by multiplying each number
    xi by some constant "b".

    a. Picture: stretched histogram
    b. The median and the mean are both multiplied by b.

  3. Summary: definitions for "center"
    1. Median: the median is the "middle number" of a list

    e.g. 77, 80, 81, 90, 93 the median is 81

    2. Mean: the mean is x-bar = (sigma xi)/n <--- recall what this means

    e.g. 77, 80, 81, 90, 93 the mean is 82.4

    3. Outliers: suppose these are the grades instead

    e.g. 17, 80, 81, 90, 93

    the median is still 81 but the mean has changed to 72.2

D. Homework for Chapter 4

Exercise Set B: 1, 2 (p. 65)
Review Exercises: 1, 4, 6a, 6b, 7 (pp. 74-75)


button Return to the Fall 1998 Statistics 10/50 Home Page

Last Update: 7 October 1998 by VXL