Statistics 10 Lecture 12

Statistics 10
Lecture 12

CHANCE ERRORS IN SAMPLING (Chapter 20)

A. Overview

Sample Surveys always involve chance error.

Think back to the concepts of POPULATION and of SAMPLES. We're interested in the PARAMETER, but given resource constraints, we must settle for a STATISTIC.

The difference between the PARAMETER and the STATISTIC is chance error.

Some references are made to Chapter 16.1 here, you can read that if you wish, but you can manage this section without it.

B. Sampling Again (Chapter 20.1)

Basic Definitions
a. The POPULATION in this example is a health study involving 6,672 Americans age 18-79.
A sociologist wishes to interview 100 of these people, that's a SAMPLE (a part of the population).
Notice, to avoid bias, she picks them at random.
b. In the first sample drawn of 100 people, they got 51 men and 49 women. The sociologist was EXPECTING 46 men and 54 women. Remember "expected values (Chapter 17)?"
c. She got 51 by the luck of the draw. Figure 1 on page 357 shows what would happen if instead of drawing one single sample, she drew 250 samples. The number of men in each sample ranged from a low of 34% to a high of 58%.
d. What is happening here is like a coin toss. The "box" (6,672 people 46% men, 54% women) stays the same from sample to sample, we'll get close to 46% but not necessarily right on it.

C. Sample Size and Standard Error

a. Idea
If we increase the size of the sample (assuming it is representative) we get a better "fix" on the PARAMETER. In Figure 2 (p. 358) we draw 250 samples of size 400 and now the range is 39% men to a high of 54%.
b. Equation
percentage in the population = percentage in a sample + chance error
The expected value = percentage in the population, the sample percentage will be off by chance error.
As long as you have a sample and not the population, you are almost certain to run into chance error.
c. Chance Error and the Standard Error
How big is the chance error? The STANDARD ERROR tells you this.
Standard Error = Square Root of Sample Size x Standard Deviation of "the box".
Example 1: for a sample of 100, the standard error is
Square Root 100 x Square Root ( .46 x .54) =
10 x .5 = 5
SE for a percentage = (SE for a number / size of sample) x 100 = 5%
Example 2: for a sample of 400, the standard error is
Square Root 400 x Square Root ( .46 x .54) =
20 x .5 = 10
SE for a percentage = (SE for a number / size of sample) x 100 = 2.5%
Note the relationship between the SE for a number and the SE for for a percentage. As the sample size increases, the SE for a number increases (look at the formula) but the SE for a percentage decreases.
Example 3: Problem 2, Exercise Set A
25,000 students, 10,000 are older than 25.
Find the expected value= number of "draws" x average of a box
160 = 400 x .4
it's like a box with 10,000 1's and 15,000 0's
Find the standard error of the number of students in the sample
standard error of the number = square root of sample size x SD of box = 20 x .5 = 10.
Find the standard error of the percentage of students
standard error of the percentage = (10 / 400) x 100 = 2.5%
The percentage of students in the sample who are older than 25 will be around 40% give or take 2.5%

D. Interpretation and the Normal Curve Again (20.3)

How do we work with:

"The percentage of students in the sample who are older than 25 will be around 40% give or take 2.5%"

We can convert these to standard units (Z scores) as in Chapter 5.

One standard error in this example is 2.5%, +1 standard error would be 40 +2.5 or 42.5%, -1 standard error will be 37.5%.

The chance that between 37.5% and 42.5% of any given sample of 400 students will be older than 25 is about 68%.

We can move from using the normal curve to figuring percentages to using the normal curve to make statements about chances.

The chance that between 35% and 45% of any given sample of 400 students will be older than 25 is about 95%

And 99%?

E. Correcting for Sampling without Replacement

Recall that a Simple Random Sample (SRS) is sampling without replacement.

To make a long story short, when populations are large, it doesn't really matter that one is sampling without replacement. But sample size does matter and it does affect the accuracy of the estimate.

Still, there is a correction factor and it is

Square root ( (population size - sample size )/ (population size - one))

It is really used when the sample is a substantial size of the population. Once a sample is only 1% of the population, the correction is negligible.

F. Homework

Exercise Set B: 2, 3
Exercise Set C: 1, 2
Review Exercises: 3, 11

Return to the Fall 1998 Statistics 10/50 Home Page

Last Update: 1 November 1998 by VXL