Statistics 10 Lecture 7 Percentiles, Transformations

1. Introduction

While many variables have distributions that are approximately normal, there are just as many situations when the distribution is not well represented by the normal curve. Recall that the median is an alternative way to describe CENTER and so there are alternative ways to describe SPREAD.

2. Minimum, Maximum, Range, Percentiles and the IQR

A. The lowest and highest values of a variable are the minimum and maximum.

B. The range is maximum-minimum.

C. Definition: a number y is the nth PERCENTILE for the data if n% of the data are less than or equal to y.

D. The QUARTILES are the 25th, 50th, and 75th percentiles and are denoted Q1, Q2, and Q3. (Another name for Q2 is the median.)

E. The INTER-QUARTILE RANGE, or IQR, is defined as IQR = Q3 - Q1; the IQR measures the spread of the middle 50% of the data. It is used in some cases as a replacement for the SD.

THOSE ARE NICE ROBUST MEASURES (e.g. relatively resistant to extreme observations) and they are good for getting an idea of what the distribution looks like (e.g. center, spread) especially when the normal approximation does not accurately represent a variable (e.g. income, education, housing prices).

3. Percentiles and the Normal Curve

Standard (Z) scores vs. percentiles: If all one does with standard scores is convert them to percentiles, then why have both?

Percentiles and standard scores have slightly different information in them. In other words, the move from standard scores to their normal curve percentile equivalents is not a smooth one-to-one transformation. Very large differences between standard scores far away from the mean (in either direction) correspond to small differences in percentiles; likewise, very small differences in standard scores near the mean correspond to large differences in percentiles.

4. Changing Scales

Z-scores are useful, but their properties are sometimes viewed as a disadvantage for particular applications. In these cases, one transforms them to scales that have more convenient means and standard deviations. For example, if one would multiply each z-score by 200 and then add 1000 to the product, the resulting new standard scores would have a mean of 1000 and a standard deviation of 200. There are several particular standard score scales in such common use that it is useful to look more closely at them. In general, if a z-score is transformed via the following formula:

Z = Bz + A ,

then the Z-score has a mean of A and a standard deviation of B.

5. Assessing Normality

A. Common sense: if the normal curve implies nonsense results (for example, that people have negative incomes, or that some women have a negative number of children), the normal curve doesn't apply.

B. Do a histogram and check the distribution: if the data are distributed like a normal curve (e.g. about 68% within 1 S.D. etc.) , the normal curve applies; otherwise, it does not.

6. More on Using the Normal Curve

A. IF the data are normally distributed, then raw scores can be converted into standard units to find percentages; also, percentages can be converted into standard units and then converted into raw scores.

B. If the data are NOT normally distributed, then using the normal curve will give the wrong answer! DO NOT ASSUME THAT DATA ARE NORMAL.