Statistics 10 Lecture 15

Statistics 10
Lecture 15

The Accuracy of Averages

A. Overview

In previous chapters, we examined the variability associated with the sum of a box (Chapter 17) and of percentages (Chapters 19, 20 and 21). In chapter 23, we turn to the variability of sample averages.

B. The Sample Mean has Randomness Too

Setup

Suppose we draw a simple random sample of size n from a large population.

An example might be -- draw a simple random sample (SRS) of 100 American women from the population of american women. Measure their heights.

Results

Suppose the population has mean of 5'3" and a standard deviation of 2.5". (you know what the box looks like for now)

Then each woman drawn from the "box" has an expected height of 5'3" with a SE = 2.5"; in other words, each woman is expected to be like the original distribution.

For the sample of 100, the expected value for the average of 100 draws is simply equal to the average of the box. You could also think of it as the sum of the draws divided by the number of draws, but again, that's just the average of the box.

The standard error for the sample of 100 is

square root (number of draws) x SD of the box
----------------------------------------------
number of draws

for a sample of 100 from this particular box:

10 x 2.5
---------  = .25 inches
100

What is happening here?

When you draw just one woman at random from the "box", your best guess about her height is 5'3" and there is a 68% chance that she will be within 2.5" of that value, a 95% chance that she will be within 5" of that value.

When you are drawing 100 women from the box at random. Your best guess about their average height is still 5'3", and there is a 68% chance that you will be within .25 inches of the population average. There is a 95% chance of being within 1/2 inches of the population average.

We can make these probability statements for the average of draws from a box even when the underlying population is not normally distributed. It's the average of all of the samples (in theory) which are normally distributed. This works when your samples are reasonably large (30 or more is reasonable)

C. Properties

The expected value of the sample average is the population average.
The standard error of the sample average is the standard error for the sum of the sample/number of draws, where the standard error for the sum is the square root of the number of draws multiplied by standard deviation of the population.

Thus, the standard error of a sample, say with twenty people, will be smaller than the standard deviation for individual measurements. It's easier to predict the average for a group than it is to predict a single measurement.

The probability histogram, that is, the distribution of sample means, will follow a normal curve even if the underlying population does not. The histogram must be coverted to Z scores and the samples drawn must be of a reasonable size.

D. When the population mean and standard deviation are unknown (23.2)

This is like the material presented in Chapter 21 and reflects real life. Usually, you don't know "the truth" and can't really measure it. But you may have a good sample.

Just like chapter 21, you use sample information to make statements about the population. Again, in the form of confidence intervals.

An example.

Suppose a psychologist wants to know the average IQ of the 28,000 students at USC. Suppose he takes a simple random sample (this is without replacement) and the sample average turns out to be 95. The standard deviation of the sample is 50.

The average IQ of all USC students is estimated as 95, but of course there is always chance error when you are dealing with samples. He will want to put a +/- estimate around the 95.

To do that, he will need an SE. Things to do

Find the SE for the sum of draws:

square root(number of draws) x SD of the sample

Find the SE for the average:

   SE for the sum
   --------------
   number of draws

Construct the confidence interval

95 +/- 5 (1 SE)

95 +/- 10 (2 SE)

In about 68% of all samples, if you go +/- 5 IQ points from the sample average of 95, you will cover the USC population average. In about 95% of all samples, if you go +/- 10 IQ points from the sample average of 95, you will cover the USC population average. Or, you might make statements of confidence: "I am 68% confident that the range 90 to 100 covers the true USC IQ average" or "I am 95% confident that the range 85 to 105 covers the true USC IQ average"

Remember that the normal curve is a good approximation of the distribution of sample averages if you could sample again and again. It allows you to make probability statements.

E. Things to keep in mind

Am I moving forward from a known box? If yes, I can probably make some strong statement about a sample. (statement of chance or probability)
If the box is unknown, I'm moving backward from a sample and cannot make as strong of a statement about the parameter. (confidence interval)
When you don't know much about the original population, the distribution of sample averages will be normal, but the underlying original population is not necessarily normal.

F. Some Examples to Test You

IQ scores are normally distributed with a mean of 100 and a standard deviation of 16. A sample of 25 persons is drawn. How likely is it to get a sample average of 108 or more? How likely is it to select one person with an IQ of 108 or more? (0.6 of 1%, 31%)

A utility company serves 50,000 households. As part of a survey of consumer attitudes, they took a simple random sample of 750 households. The average number of TV sets in the sample is 1.86 and the SD is 0.80. Find a 95% confidence interval for the number of TV sets in all 50,000 households.