Statistics 10
Lecture 13


MORE ABOUT CHANCE ERRORS IN SAMPLING

A. Sample Size and Standard Error

a. Idea
If we increase the size of the sample (assuming it is representative) we get a better "fix" on the PARAMETER. In Figure 2 (p. 358) we draw 250 samples of size 400 and now the range is 39% men to a high of 54%.
b. Equation
percentage in the population = percentage in a sample + chance error
The expected value = percentage in the population, the sample percentage will be off by chance error.
As long as you have a sample and not the population, you are almost certain to run into chance error.
c. Chance Error and the Standard Error
How big is the chance error? The STANDARD ERROR tells you this.
Standard Error = Square Root of Sample Size x Standard Deviation of "the box".
Example 1: for a sample of 100, the standard error is
Square Root 100 x Square Root ( .46 x .54) =
10 x .5 = 5
SE for a percentage = (SE for a number / size of sample) x 100 = 5%
Example 2: for a sample of 400, the standard error is
Square Root 400 x Square Root ( .46 x .54) =
20 x .5 = 10
SE for a percentage = (SE for a number / size of sample) x 100 = 2.5%
Note the relationship between the SE for a number and the SE for for a percentage. As the sample size increases, the SE for a number increases (look at the formula) but the SE for a percentage decreases.
Example 3: Problem 2, Exercise Set A
25,000 students, 10,000 are older than 25.
Find the expected value= number of "draws" x average of a box
160 = 400 x .4
it's like a box with 10,000 1's and 15,000 0's
Find the standard error of the number of students in the sample
standard error of the number = square root of sample size x SD of box = 20 x .5 = 10.
Find the standard error of the percentage of students
standard error of the percentage = (10 / 400) x 100 = 2.5%
The percentage of students in the sample who are older than 25 will be around 40% give or take 2.5%

B. Interpretation and the Normal Curve Again (20.3)

How do we work with:

"The percentage of students in the sample who are older than 25 will be around 40% give or take 2.5%"

We can convert these to standard units (Z scores) as in Chapter 5.

One standard error in this example is 2.5%, +1 standard error would be 40 +2.5 or 42.5%, -1 standard error will be 37.5%.

The chance that between 37.5% and 42.5% of any given sample of 400 students will be older than 25 is about 68%.

We can move from using the normal curve to figuring percentages to using the normal curve to make statements about chances.

The chance that between 35% and 45% of any given sample of 400 students will be older than 25 is about 95%

And 99%?

C. Sample Size and Standard Error (summarized)

  1. Idea
  2. If we increase the size of the sample (assuming it is representative) we get a better "fix" on the PARAMETER.

  3. Equation
  4. percentage in the population = percentage in a sample + chance error

    The expected value = percentage in the population

    The sample percentage will be off by chance error.

    As long as you have a sample and not the population, you are almost certain to run into chance error.

  5. Chance Error and the Standard Error
  6. How big is the chance error? The STANDARD ERROR gives you an idea of the size of the chance error.

    Standard Error = Square Root of the number of draws (i.e. sample size) x Standard Deviation of "the box" (i.e. the population).

  7. Examples
  8. A college has 25,000 students, 10,000 are over age 25, 15,000 are under age 25.

    A sample of 100 students is drawn, what is the expected value and SE for the number of students in the sample who are over 25?

    Expected value for a number = (# of draws from the box) x (average of the box)

    40 = 100 x ((10,000 x 1 + 15,000 x 0)/25000)

    SE for a number = Square Root of # draws x SD of the box

    5 = 10 x .5

    A sample of 900 students is drawn, what is the expected value and SE for the number of students in the sample who are over 25?

    360 = 900 x ((10,000 x 1 + 15,000 x 0)/25000)

    15 = 30 x .5

    A sample of 3600 students is drawn, what is the expected value and SE for the number of students in the sample who are over 25?

    1440 = 3600 x ((10,000 x 1 + 15,000 x 0)/25000)

    30 = 60 x .5

  9. Things to note

(1) the parameter(s) stay fixed. The box always has an average of .40 and the Standard Deviation of the box is always .50.

(2) the standard error for the number (or count) of students gets larger as the sample gets larger

(3) as a percentage of the sample, it gets smaller, i.e. 5/100 = 5%, 15/900 = 1.7%, 30 / 3600 = 0.8%

Sample Size

Expected number

SE of the number

Expected percentage

SE of the percentage

100

40

5

40%

5%

900

360

15

40%

1.7%

3600

1440

30

40%

0.8%

D. Interpretation and the Normal Curve Again (20.3)

How do we work with:

"The percentage of students in a sample of 100 who are older than 25 will be around 40% give or take 5%"

One standard error in this example is 5%, +1 standard error would be 40% +5% or 45%, -1 standard error will be 35%.

The chance that between 35% and 45% of any given sample of 100 students will be older than 25 is about 68%.

We can move from using the normal curve to figuring percentages to using the normal curve to make statements about chances.

The chance that between 30% and 50% of any given sample of 100 students will be older than 25 is about 95%

And 99%? 25% to 55%

What's going on here. As was suggested in Chapter 20.1, if we could sample infinitely, the sample percentages would bunch around the true value and have the appearance of a normal distribution.

We can borrow from that property and use the normal curve to make statements about chances of getting samples with particular characteristics. We can convert percentages (or any other kind of statistic) to Z scores.

 

Some differences: you are now working with SE instead of SD. And a parameter -- which can be a percentage, an average, a sum -- which is fixed. And a sample statistic.

 

Example: for the sample of 100 students, we can say that the chance of getting between 35 and 45 students over age 25 is 68%, between 30 and 50 students is 95% and so forth.

Example: for the sample of 100 students, what is the chance of getting between 37 and 43 students over age 25?

Z = (43 - 40) / 5 = 3/5 = .60

A Z = .60 has 45.15% between +/- Z which could be interpreted as you have a 45.15% chance of getting between 37 and 43 students

E. Correcting for Sampling without Replacement (OPTIONAL)

Recall that a Simple Random Sample (SRS) is sampling without replacement.

To make a long story short, when populations are large, it doesn't really matter that one is sampling without replacement. But sample size does matter and it does affect the accuracy of the estimate.

Still, there is a correction factor (OPTIONAL) and it is

Square root ( (population size - sample size )/ (population size - one))

It is really used when the sample is a substantial size of the population. Once a sample is only 1% of the population, the correction is negligible.

Bottom line, sometimes sampling without replacement is almost like sampling with replacement.