In previous chapters, we examined the variability associated with the sum of a box (Chapter 17) and of percentages (Chapters 19, 20 and 21). In chapter 23, we turn to the variability of sample averages.
Suppose we draw a simple random sample of size n from a large population.
An example might be -- draw a simple random sample (SRS) of 100 American women from the population of american women. Measure their heights.
Suppose the population has mean of 5'3" and a standard deviation of 2.5". (you know what the box looks like for now)
Then each woman drawn from the "box" has an expected height of 5'3" with a SE = 2.5"; in other words, each woman is expected to be like the original distribution.
For the sample of 100, the expected value for the average of 100 draws is simply equal to the average of the box. You could also think of it as the sum of the draws divided by the number of draws, but again, that's just the average of the box.
The standard error for the sample of 100 is
square root (number of draws) x SD of the box ---------------------------------------------- number of draws
for a sample of 100 from this particular box:
10 x 2.5 --------- = .25 inches 100
When you draw just one woman at random from the "box", your best guess about her height is 5'3" and there is a 68% chance that she will be within 2.5" of that value, a 95% chance that she will be within 5" of that value.
When you are drawing 100 women from the box at random. Your best guess about their average height is still 5'3", and there is a 68% chance that you will be within .25 inches of the population average. There is a 95% chance of being within 1/2 inches of the population average.
We can make these probability statements for the average of draws from a box even when the underlying population is not normally distributed. It's the average of all of the samples (in theory) which are normally distributed. This works when your samples are reasonably large (30 or more is reasonable)
Thus, the standard error of a sample, say with twenty people, will be smaller than the standard deviation for individual measurements. It's easier to predict the average for a group than it is to predict a single measurement.
This is like the material presented in Chapter 21 and reflects real life. Usually, you don't know "the truth" and can't really measure it. But you may have a good sample.
Just like chapter 21, you use sample information to make statements about the population. Again, in the form of confidence intervals.
An example.
Suppose a psychologist wants to know the average IQ of the 28,000 students at USC. Suppose he takes a simple random sample (this is without replacement) and the sample average turns out to be 95. The standard deviation of the sample is 50.
The average IQ of all USC students is estimated as 95, but of course there is always chance error when you are dealing with samples. He will want to put a +/- estimate around the 95.
To do that, he will need an SE. Things to do
square root(number of draws) x SD of the sample
SE for the sum -------------- number of draws
95 +/- 5 (1 SE)
95 +/- 10 (2 SE)
In about 68% of all samples, if you go +/- 5 IQ points from the sample average of 95, you will cover the USC population average. In about 95% of all samples, if you go +/- 10 IQ points from the sample average of 95, you will cover the USC population average. Or, you might make statements of confidence: "I am 68% confident that the range 90 to 100 covers the true USC IQ average" or "I am 95% confident that the range 85 to 105 covers the true USC IQ average"
Remember that the normal curve is a good approximation of the distribution of sample averages if you could sample again and again. It allows you to make probability statements.
IQ scores are normally distributed with a mean of 100 and a standard deviation of 16. A sample of 25 persons is drawn. How likely is it to get a sample average of 108 or more? How likely is it to select one person with an IQ of 108 or more? (0.6 of 1%, 31%)
A utility company serves 50,000 households. As part of a survey of consumer attitudes, they took a simple random sample of 750 households. The average number of TV sets in the sample is 1.86 and the SD is 0.80. Find a 95% confidence interval for the number of TV sets in all 50,000 households.