1. Overview
In
previous chapters, we examined the variability associated with the sum of a box
(Chapter 17) and of percentages (Chapters 19, 20 and 21). In chapter 23, we
turn to the variability of sample averages.
2. Example
Suppose
we draw a simple random sample of size n from a large population. An example might be -- draw a simple random
sample (SRS) of 100 American women from the population of all American women.
Measure their heights. Suppose we knew
that the population has mean of 5'3" and a standard deviation of
2.5". (you know what the box looks like for now)
Then
each woman drawn from the "box" has an expected height of 5'3"
with a SE = 2.5"; in other words, each woman is expected to be like the
original distribution. For the sample
of 100, the expected value for the average of 100 draws is simply equal to the
average of the box. You could also think of it as the sum of the draws divided
by the number of draws, but again, that's just the average. (see page 410)
The standard
error for the sample of 100 is
square root (number of draws) x SD of the box
----------------------------------------------
number of draws
for a sample of
100 from this particular box the standard error would be:
10 x 2.5
--------- = .25 inches
100
3. Interpretations
When
you draw just one woman at random from the "box", your best guess
about her height is 5'3" and there is a 68% chance that she will be within
2.5" of that value, a 95% chance that she will be within 5" of that
value. When you are drawing 100 women
from the box at random. Your best guess about their average height is still
5'3", but now there is a 68% chance that you will be within .25 inches of
the population average. And there is a 95% chance of being within 1/2 inches of
the population average.
We can make these probability statements for the average of draws from a box even when the underlying population is not normally distributed. It's the average of all of the samples (in theory) which are normally distributed. This works when your samples are reasonably large (30 or more is reasonable)
4. Properties
The
expected value of the sample average is the population average. (see page 410)
The standard error of the
sample average is the standard error for the sum of the sample/number of draws,
where the standard error for the sum is the square root of the number of draws
multiplied by standard deviation of the population. (also page 410)
Thus, the
standard error of a sample, say with twenty people, will be smaller than the
standard deviation for individual measurements. It's easier to predict the
average for a group than it is to predict a single measurement.
The
probability histogram, that is, the distribution of sample means, will follow a
normal curve even if the underlying population does not. The samples drawn must
be of a reasonable size (> 30).
5. When the population mean and standard deviation are
unknown (23.2)
This
is like the material presented in Chapter 21 and reflects real life. Usually,
you don't know "the truth" and can't really measure it. But you may
have a good sample. Just like chapter
21, you use sample information to make statements about the population. Again,
in the form of confidence intervals.
Suppose a
psychologist wants to know the average IQ of the 28,000 students at USC. Suppose
he takes a simple random sample of 100 and the sample average turns out to be
95. The standard deviation of the sample is 50.
The average IQ
of all USC students is estimated as 95, but of course there is always chance
error when you are dealing with samples. He will want to put a +/- estimate
around the 95.
To do that, he
will need an SE. Things to do
square root(number of draws) x SD of the sample
SE for the sum
--------------
number of draws
95 +/- 5 (1 SE) (for 68% confidence)
95 +/- 10 (2 SE) (for 95% confidence -- what would 99% confidence look like?)
In about 68% of
all samples of size 100, if you go +/- 5 IQ points from the sample average of
95, you will cover the USC population average. In about 95% of all samples of
size 100, if you go +/- 10 IQ points from the sample average of 95, you will
cover the USC population average. Or, you might make statements of confidence:
"I am 68% confident that the range 90 to 100 covers the true USC IQ
average" or "I am 95% confident that the range 85 to 105 covers the
true USC IQ average". Human IQ
average 100, so it is entirely possible that the true mean IQ of USC students
is 100.
Remember that the normal curve is a good approximation of the distribution of sample averages if you could sample again and again. It allows you to make probability statements.
6. Things to keep in mind
A. Am I moving forward from a known box? If yes, I can probably
make some strong statement about a sample. (statement of chance or probability)
B. If the box is unknown, I'm moving backward from a sample and
cannot make as strong of a statement about the parameter. (confidence interval)
C. When you don't know much about the original population, the distribution of sample averages will be normal, but the underlying original population is not necessarily normal.
7. Some Examples to Test You
IQ
scores are normally distributed with a mean of 100 and a standard deviation of
16. A sample of 25 persons is drawn. How likely is it to get a sample average
of 108 or more? How likely is it to select one person with an IQ of 108 or
more? (0.6 of 1%, 31%)
A
utility company serves 50,000 households. As part of a survey of consumer attitudes,
they took a simple random sample of 750 households. The average number of TV
sets in the sample is 1.86 and the SD is 0.80. Find a 95% confidence interval
for the number of TV sets in all 50,000 households.