Statistics 10/50

Statistics 10/50
Lecture 7

Percentiles, Quartiles, and Other Measures (Chapter 5.4 - 5.6)

A. Background

While many variables have distributions that are approximately normal, there are just as many situations when the distribution is not well represented by the normal curve. What follows is a discussion on additional statistics that are used to describe CENTER and SPREAD.

B. Minimum, Maximum, Range, Percentiles and the IQR

1. The lowest and highest values of a variable are the minimum and maximum. 2. The range is maximum-minimum. 3. Definition: a number y is the nth PERCENTILE for the data if n% of the data are less than or equal to y. 4. The QUARTILES are the 25th, 50th, and 75th percentiles and are denoted Q1, Q2, and Q3. (Another name for Q2 is the median.) 5. The INTER-QUARTILE RANGE, or IQR, is defined as IQR = Q3 - Q1; the IQR measures the spread of the middle 50% of the data. It is used in some cases as a replacement for the SD.

THOSE ARE NICE ROBUST MEASURES (e.g. relatively resistant to extreme observations) and they are good for getting an idea of what the distribution looks like (e.g. center, spread) especially when the normal approximation does not accurately represent a variable (e.g. income, education, housing prices).

C. Recap of "CENTER" and "SPREAD"

1. Mean or Average (denoted "x-bar") 2. Median (denoted "M") 3. Standard Deviation (denoted "s" or "SD") 4. Range: the largest value less the smallest value 5. Inter-Quartile Range (IQR): IQR = Q3-Q1, where Q3 is the third quartile (the 75th percentile) and Q1 is the first quartile (25th percentile) One way of finding Q1 and Q3 is to find the median of the sublists.

D. Percentiles and the Normal Curve

Standard scores vs. percentiles: If all one does with standard scores is convert them to percentiles, then why have both?
Percentiles and standard scores have slightly different information in them. In other words, the move from standard scores to their normal curve percentile equivalents is not a smooth one-to-one transformation. Very large differences between standard scores far away from the mean (in either direction) correspond to small differences in percentiles; likewise, very small differences in standard scores near the mean correspond to large differences in percentiles.
Consider two groups of three persons each whose heights are measured both in inches and in percentiles among adult males:

                       Heights-inches      Heights-percentiles
                       ______________      ___________________
               Group A: 70", 72", 84"       80, 92, 99.999
               Group B: 70", 74", 76"       80, 95, 99.9
                                        

                                Group A     Group B
                                _______     _______
            Mean in inches      75.333       73.333
            Mean Percentiles     90.67       91.63

Notice that Group A is taller than Group B when heights are expressed in inches, but Group B is "taller" when heights are expressed in percentiles.

A quick example: What is percentile of an SAT score of 720? Is it "better" (i.e., relatively higher) than an ACT of 28? (The mean SAT score is 500 with an SD of 100, the mean ACT is 20 with an SD of 5)

E. Changing Scales

Z-scores are useful, but their properties are sometimes viewed as a disadvantage for particular applications. In these cases, one transforms them to scales that have more convenient means and standard deviations. For example, if one would multiply each z-score by 200 and then add 1000 to the product, the resulting new standard scores would have a mean of 1000 and a standard deviation of 200. There are several particular standard score scales in such common use that it is useful to look more closely at them. In general, if a z-score is transformed via the following formula:
Z = Bz + A ,
then the Z-score has a mean of A and a standard deviation of B.

Some well known transformed scores:

Mean	SD	Scale Name
500	100	SAT; GRE; GMAT
20	5	ACT
100	15	Wechsler IQ Test
100	16	Stanford Binet IQ Test

1. Given the list 1,2,3,4,5,6,7,8,9,10,11,12:

a. n=12 b. median = 6.5; Q1 = 3.5, Q3 = 9.5 c. mean = 78/12 = 6.5 d. range = 12-1 = 11 e. IQR = Q3-Q1 = 9.5-3.5 = 6 f. sum (x-xbar)² = 143; sum x = 78, sum x² = 650; SD = sqrt(143/12) = sqrt((650-78^2/12)/12) = 3.452

2. Given the list 1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10, 10,11,11,12,12

a. n=24 b. median = 6.5; Q1 = 3.5, Q3 = 9.5 c. mean = 156/24=6.5 d. range = 12-1 = 11 e. IQR = Q3-Q1 = 9.5-3.5 = 6 f. sum (x-xbar)² = 286; sum x = 156, sum x² = 1300; SD = sqrt(286/24) = sqrt((1300-(156²/24))/24) = 3.452

NOTE: Every statistic stayed the same, only n changed. What matters here is the relative frequency of a value.

3. Given the list 3,4,5,6,7,8,9,10,11,12,13,14:

a. n=12 b. median = 8.5; Q1 = 5.5, Q3 = 11.5 c. mean = 102/12=8.5 d. range = 14-3 = 11 e. IQR = Q3-Q1 = 11.5-5.5 = 6 f. sum (x-xbar)² = 143; sum x = 102, sum x² = 1010; SD = sqrt(143/12) = sqrt((1010-(102²/12))/12) = 3.452

NOTE: the Histogram shifts, the mean and median both increase by the size of the constant. The range, IQR and SD are unchanged.

4. Given the list 2,4,6,8,10,12,14,16,18,20,22,24

a. Picture: stretched histogram b. The median and the mean are both multiplied by b (in this case b is 2). c. The range, IQR, and SD are also multiplied by b (in this case b is 2).

F. Vocabulary

Normal Curve
Standard Units
Range
IQR

Return to the Fall 1998 Statistics 10/50 Home Page

Last Update: 15 October 1998 by VXL

Statistics 10/50 Lecture 7

Percentiles, Quartiles, and Other Measures (Chapter 5.4 - 5.6)

C. Recap of "CENTER" and "SPREAD"

D. Percentiles and the Normal Curve

E. Changing Scales

Statistics 10/50
Lecture 7