Statistics 10
Lecture 7


Percentiles, Quartiles, and the Normal Curve (Chapter 5.4 - 5.6)

A. Background

While many variables have distributions that are approximately normal, there are just as many situations when the distribution is not well represented by the normal curve. What follows is a discussion on additional statistics that are used to describe CENTER and SPREAD.

B. Minimum, Maximum, Range, Percentiles and the IQR

1. The lowest and highest values of a variable are the minimum and maximum. 2. The range is maximum-minimum. 3. Definition: a number y is the nth PERCENTILE for the data if n% of the data are less than or equal to y. 4. The QUARTILES are the 25th, 50th, and 75th percentiles and are denoted Q1, Q2, and Q3. (Another name for Q2 is the median.) 5. The INTER-QUARTILE RANGE, or IQR, is defined as IQR = Q3 - Q1; the IQR measures the spread of the middle 50% of the data. It is used in some cases as a replacement for the SD.

THOSE ARE NICE ROBUST MEASURES (e.g. relatively resistant to extreme observations) and they are good for getting an idea of what the distribution looks like (e.g. center, spread) especially when the normal approximation does not accurately represent a variable (e.g. income, education, housing prices).

C. Recap of "CENTER" and "SPREAD"

1. Mean or Average (denoted "x-bar") 2. Median (denoted "M") 3. Standard Deviation (denoted "s" or "SD") 4. Range: the largest value less the smallest value 5. Inter-Quartile Range (IQR): IQR = Q3-Q1, where Q3 is the third quartile (the 75th percentile) and Q1 is the first quartile (25th percentile) One way of finding Q1 and Q3 is to find the median of the sublists.

D. Percentiles and the Normal Curve

Standard scores vs. percentiles: If all one does with standard scores is convert them to percentiles, then why have both?

Percentiles and standard scores have slightly different information in them. In other words, the move from standard scores to their normal curve percentile equivalents is not a smooth one-to-one transformation. Very large differences between standard scores far away from the mean (in either direction) correspond to small differences in percentiles; likewise, very small differences in standard scores near the mean correspond to large differences in percentiles.

Consider two groups of three persons each whose heights are measured both in inches and in percentiles among adult males:

                       Heights-inches      Heights-percentiles
                       ______________      ___________________
               Group A: 70", 72", 84"       80, 92, 99.999
               Group B: 70", 74", 76"       80, 95, 99.9
                                        

                                Group A     Group B
                                _______     _______
            Mean in inches      75.333       73.333
            Mean Percentiles     90.67       91.63

Notice that Group A is taller than Group B when heights are expressed in inches, but Group B is "taller" when heights are expressed in percentiles.

A quick example: What is percentile of an SAT score of 720? Is it "better" (i.e., relatively higher) than an ACT of 28? (The mean SAT score is 500 with an SD of 100, the mean ACT is 20 with an SD of 5)

E. Changing Scales

  • Numerical Examples
        A.  Given the list 1,2,3,4,5,6,7,8,9,10,11,12:
    
            a.  n=12
            b.  median = 6.5; Q1 = 3.5, Q3 = 9.5
            c.  mean = 78/12 = 6.5
    
            d.  range = 12-1 = 11
            e.  sum (x-xbar)2 = 143; sum x = 78, sum x2 = 650;
                SD = sqrt(143/12) = sqrt((650-782/12)/12) = 3.452
    
        B.  Suppose we construct a new list by adding a constant "a" to 
    	each number in the old list.
        

    a. Picture: shifted histogram.
    b. The median and the mean both go up by a.
    c. The range and SD are unchanged!

    C. Suppose we construct a new list by multiplying each number
    xi by some constant "b".

    a. Picture: stretched histogram
    b. The median and the mean are both multiplied by b.
    c. The range and SD are also multiplied by b.

    D. Given the list 1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9, 10,10,11,11,12,12 a. n=24 b. median = 6.5; Q1 = 3.5, Q3 = 9.5 c. mean = 156/24=6.5 d. range = 12-1 = 11 e. sum (x-xbar)2 = 286; sum x = 156, sum x2 = 1300; SD = sqrt(286/24) = sqrt((1300-1562/24)/24) = 3.452 NOTE: everything stayed the same. What matters here is the relative frequency of a value.

    Z-scores are useful, but their properties are sometimes viewed as a disadvantage for particular applications. In these cases, one transforms them to scales that have more convenient means and standard deviations. For example, if one would multiply each z-score by 200 and then add 1000 to the product, the resulting new standard scores would have a mean of 1000 and a standard deviation of 200. There are several particular standard score scales in such common use that it is useful to look more closely at them. In general, if a z-score is transformed via the following formula:

    Z = Bz + A ,

    then the Z-score has a mean of A and a standard deviation of B.

    Some well known transformed scores:
    Mean SD Scale Name
    500 100 SAT; GRE; GMAT
    20 5 ACT
    100 15 Wechsler IQ Test
    100 16 Stanford Binet IQ Test

    E. Assessing Normality

    1. Common sense: if the normal curve implies nonsense results (for example, that people have negative incomes, or that some women have a negative number of children), the normal curve doesn't apply.

    Example: in 1980, the average number of children born per woman was 1.95, with an SD of 1.91. Does the normal curve apply? Try calculating how many children a woman would have if she is 2 standard deviations BELOW the mean.

    (No; the data have a long right hand tail this distribution is skewed to the right. A woman who is 2 SD below the mean has -1.87 children..)

    2. Do a histogram: if the data look like a normal curve, the normal curve applies; otherwise, it does not.

    3. There are more advanced methods, but they are outside of the scope of this course.

    F. More on Using the Normal Curve

    1. IF the data are normally distributed, then raw scores can be converted into standard units to find percentages; also, percentages can be converted into standard units and then converted into raw scores.

    2. If the data are NOT normally distributed, then using the normal curve will give the wrong answer!

    G. HOMEWORK (DUE 2/11/00)

    Exercise Set E: 1, 3
    Review Exercises: 8, 9

    Last Update: 25 January 2000 by VXL