1. Introduction
While
many variables have distributions that are approximately normal, there are just
as many situations when the distribution is not well represented by the normal
curve. Recall that the median is an alternative way to describe CENTER and so
there are alternative ways to describe SPREAD.
A. The lowest and highest values of
a variable are the minimum and maximum.
B. The range is maximum-minimum.
C. Definition: a number y is the nth
PERCENTILE for the data if n% of the data are less than or equal to y.
D. The QUARTILES are the 25th, 50th,
and 75th percentiles and are denoted Q1, Q2, and Q3. (Another name for Q2 is
the median.)
E. The INTER-QUARTILE RANGE, or IQR,
is defined as IQR = Q3 - Q1; the IQR measures the spread of the middle 50% of
the data. It is used in some cases as a replacement for the SD.
THOSE ARE NICE ROBUST MEASURES (e.g. relatively resistant to extreme observations) and they are good for getting an idea of what the distribution looks like (e.g. center, spread) especially when the normal approximation does not accurately represent a variable (e.g. income, education, housing prices).
Standard (Z) scores vs. percentiles: If all one does with standard
scores is convert them to percentiles, then why have both?
Percentiles and standard scores have slightly different information in them. In other words, the move from standard scores to their normal curve percentile equivalents is not a smooth one-to-one transformation. Very large differences between standard scores far away from the mean (in either direction) correspond to small differences in percentiles; likewise, very small differences in standard scores near the mean correspond to large differences in percentiles.
Z-scores
are useful, but their properties are sometimes viewed as a disadvantage for
particular applications. In these cases, one transforms them to scales that
have more convenient means and standard deviations. For example, if one would
multiply each z-score by 200 and then add 1000 to the product, the resulting
new standard scores would have a mean of 1000 and a standard deviation of 200.
There are several particular standard score scales in such common use that it
is useful to look more closely at them. In general, if a z-score is transformed
via the following formula:
Z = Bz + A ,
then the Z-score has a mean of A and a
standard deviation of B.
A. Common sense: if the normal curve implies
nonsense results (for example, that people have negative incomes, or that some
women have a negative number of children), the normal curve doesn't apply.
B. Do a histogram and check the distribution: if the
data are distributed like a normal curve (e.g. about 68% within 1 S.D. etc.) ,
the normal curve applies; otherwise, it does not.
A. IF the data are normally distributed, then raw scores can be
converted into standard units to find percentages; also, percentages can be
converted into standard units and then converted into raw scores.
B. If the data are NOT normally distributed, then using the normal curve will give the wrong answer! DO NOT ASSUME THAT DATA ARE NORMAL.