A. Background
While many variables have distributions that are approximately normal, there are just as many situations when the distribution is not well represented by the normal curve. What follows is a discussion on additional statistics that are used to describe CENTER and SPREAD.
B. Minimum, Maximum, Range, Percentiles and the IQR
THOSE ARE NICE ROBUST MEASURES (e.g. relatively resistant to extreme observations) and they are good for getting an idea of what the distribution looks like (e.g. center, spread) especially when the normal approximation does not accurately represent a variable (e.g. income, education, housing prices).
Standard scores vs. percentiles: If all one does with standard scores is convert them to percentiles, then why have both?Percentiles and standard scores have slightly different information in them. In other words, the move from standard scores to their normal curve percentile equivalents is not a smooth one-to-one transformation. Very large differences between standard scores far away from the mean (in either direction) correspond to small differences in percentiles; likewise, very small differences in standard scores near the mean correspond to large differences in percentiles.
Consider two groups of three persons each whose heights are measured both in inches and in percentiles among adult males:
Heights-inches Heights-percentiles ______________ ___________________ Group A: 70", 72", 84" 80, 92, 99.999 Group B: 70", 74", 76" 80, 95, 99.9 Group A Group B _______ _______ Mean in inches 75.333 73.333 Mean Percentiles 90.67 91.63
Notice that Group A is taller than Group B when heights are expressed in inches, but Group B is "taller" when heights are expressed in percentiles.
A quick example: What is percentile of an SAT score of 720? Is it "better" (i.e., relatively higher) than an ACT of 28? (The mean SAT score is 500 with an SD of 100, the mean ACT is 20 with an SD of 5)
A. Given the list 1,2,3,4,5,6,7,8,9,10,11,12: a. n=12 b. median = 6.5; Q1 = 3.5, Q3 = 9.5 c. mean = 78/12 = 6.5 d. range = 12-1 = 11 e. sum (x-xbar)2 = 143; sum x = 78, sum x2 = 650; SD = sqrt(143/12) = sqrt((650-782/12)/12) = 3.452 B. Suppose we construct a new list by adding a constant "a" to each number in the old list.a. Picture: shifted histogram.
b. The median and the mean both go up by a.
c. The range and SD are unchanged!
C. Suppose we construct a new list by multiplying each number
xi by some constant "b".a. Picture: stretched histogram
b. The median and the mean are both multiplied by b.
c. The range and SD are also multiplied by b.
D. Given the list 1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9, 10,10,11,11,12,12 a. n=24 b. median = 6.5; Q1 = 3.5, Q3 = 9.5 c. mean = 156/24=6.5 d. range = 12-1 = 11 e. sum (x-xbar)2 = 286; sum x = 156, sum x2 = 1300; SD = sqrt(286/24) = sqrt((1300-1562/24)/24) = 3.452 NOTE: everything stayed the same. What matters here is the relative frequency of a value.
Z-scores are useful, but their properties are sometimes viewed as a disadvantage for particular applications. In these cases, one transforms them to scales that have more convenient means and standard deviations. For example, if one would multiply each z-score by 200 and then add 1000 to the product, the resulting new standard scores would have a mean of 1000 and a standard deviation of 200. There are several particular standard score scales in such common use that it is useful to look more closely at them. In general, if a z-score is transformed via the following formula:Some well known transformed scores:Z = Bz + A ,
then the Z-score has a mean of A and a standard deviation of B.
Mean | SD | Scale Name |
---|---|---|
500 | 100 | SAT; GRE; GMAT |
20 | 5 | ACT |
100 | 15 | Wechsler IQ Test |
100 | 16 | Stanford Binet IQ Test |
1. Common sense: if the normal curve implies nonsense results (for example, that people have negative incomes, or that some women have a negative number of children), the normal curve doesn't apply.Example: in 1980, the average number of children born per woman was 1.95, with an SD of 1.91. Does the normal curve apply? Try calculating how many children a woman would have if she is 2 standard deviations BELOW the mean.
(No; the data have a long right hand tail this distribution is skewed to the right. A woman who is 2 SD below the mean has -1.87 children..)
2. Do a histogram: if the data look like a normal curve, the normal curve applies; otherwise, it does not.
3. There are more advanced methods, but they are outside of the scope of this course.
1. IF the data are normally distributed, then raw scores can be converted into standard units to find percentages; also, percentages can be converted into standard units and then converted into raw scores.2. If the data are NOT normally distributed, then using the normal curve will give the wrong answer!