a. Idea
If we increase the size of the sample (assuming it is representative) we get a better "fix" on the PARAMETER. In Figure 2 (p. 358) we draw 250 samples of size 400 and now the range is 39% men to a high of 54%.
b. Equation
percentage in the population = percentage in a sample + chance error
The expected value = percentage in the population, the sample percentage will be off by chance error.
As long as you have a sample and not the population, you are almost certain to run into chance error.
c. Chance Error and the Standard Error
How big is the chance error? The STANDARD ERROR tells you this.
Standard Error = Square Root of Sample Size x Standard Deviation of "the box".
Example 1: for a sample of 100, the standard error is
Square Root 100 x Square Root ( .46 x .54) =
10 x .5 = 5
SE for a percentage = (SE for a number / size of sample) x 100 = 5%
Example 2: for a sample of 400, the standard error is
Square Root 400 x Square Root ( .46 x .54) =
20 x .5 = 10
SE for a percentage = (SE for a number / size of sample) x 100 = 2.5%
Note the relationship between the SE for a number and the SE for for a percentage. As the sample size increases, the SE for a number increases (look at the formula) but the SE for a percentage decreases.
Example 3: Problem 2, Exercise Set A
25,000 students, 10,000 are older than 25.
Find the expected value= number of "draws" x average of a box
160 = 400 x .4
it's like a box with 10,000 1's and 15,000 0's
Find the standard error of the number of students in the sample
standard error of the number = square root of sample size x SD of box = 20 x .5 = 10.
Find the standard error of the percentage of students
standard error of the percentage = (10 / 400) x 100 = 2.5%
The percentage of students in the sample who are older than 25 will be around 40% give or take 2.5%
How do we work with:
"The percentage of students in the sample who are older than 25 will be around 40% give or take 2.5%"
We can convert these to standard units (Z scores) as in Chapter 5.
One standard error in this example is 2.5%, +1 standard error would be 40 +2.5 or 42.5%, -1 standard error will be 37.5%.
The chance that between 37.5% and 42.5% of any given sample of 400 students will be older than 25 is about 68%.
We can move from using the normal curve to figuring percentages to using the normal curve to make statements about chances.
The chance that between 35% and 45% of any given sample of 400 students will be older than 25 is about 95%
And 99%?
If we increase the size of the sample (assuming it is representative) we get a better "fix" on the PARAMETER.
percentage in the population = percentage in a sample + chance error
The expected value = percentage in the population
The sample percentage will be off by chance error.
As long as you have a sample and not the population, you are almost certain to run into chance error.
How big is the chance error? The STANDARD ERROR gives you an idea of the size of the chance error.
Standard Error = Square Root of the number of draws (i.e. sample size) x Standard Deviation of "the box" (i.e. the population).
A college has 25,000 students, 10,000 are over age 25, 15,000 are under age 25.
A sample of 100 students is drawn, what is the expected value and SE for the number of students in the sample who are over 25?
Expected value for a number = (# of draws from the box) x (average of the box)
40 = 100 x ((10,000 x 1 + 15,000 x 0)/25000)
SE for a number = Square Root of # draws x SD of the box
5 = 10 x .5
A sample of 900 students is drawn, what is the expected value and SE for the number of students in the sample who are over 25?
360 = 900 x ((10,000 x 1 + 15,000 x 0)/25000)
15 = 30 x .5
A sample of 3600 students is drawn, what is the expected value and SE for the number of students in the sample who are over 25?
1440 = 3600 x ((10,000 x 1 + 15,000 x 0)/25000)
30 = 60 x .5
(1) the parameter(s) stay fixed. The box always has an average of .40 and the Standard Deviation of the box is always .50.
(2) the standard error for the number (or count) of students gets larger as the sample gets larger
(3) as a percentage of the sample, it gets smaller, i.e. 5/100 = 5%, 15/900 = 1.7%, 30 / 3600 = 0.8%
Sample Size |
Expected number |
SE of the number |
Expected percentage |
SE of the percentage |
100 |
40 |
5 |
40% |
5% |
900 |
360 |
15 |
40% |
1.7% |
3600 |
1440 |
30 |
40% |
0.8% |
How do we work with:
"The percentage of students in a sample of 100 who are older than 25 will be around 40% give or take 5%"
One standard error in this example is 5%, +1 standard error would be 40% +5% or 45%, -1 standard error will be 35%.
The chance that between 35% and 45% of any given sample of 100 students will be older than 25 is about 68%.
We can move from using the normal curve to figuring percentages to using the normal curve to make statements about chances.
The chance that between 30% and 50% of any given sample of 100 students will be older than 25 is about 95%
And 99%? 25% to 55%
What's going on here. As was suggested in Chapter 20.1, if we could sample infinitely, the sample percentages would bunch around the true value and have the appearance of a normal distribution.
We can borrow from that property and use the normal curve to make statements about chances of getting samples with particular characteristics. We can convert percentages (or any other kind of statistic) to Z scores.
Some differences: you are now working with SE instead of SD. And a parameter -- which can be a percentage, an average, a sum -- which is fixed. And a sample statistic.
Example: for the sample of 100 students, we can say that the chance of getting between 35 and 45 students over age 25 is 68%, between 30 and 50 students is 95% and so forth.
Example: for the sample of 100 students, what is the chance of getting between 37 and 43 students over age 25?
Z = (43 - 40) / 5 = 3/5 = .60
A Z = .60 has 45.15% between +/- Z which could be interpreted as you have a 45.15% chance of getting between 37 and 43 students
Recall that a Simple Random Sample (SRS) is sampling without replacement.
To make a long story short, when populations are large, it doesn't really matter that one is sampling without replacement. But sample size does matter and it does affect the accuracy of the estimate.
Still, there is a correction factor (OPTIONAL) and it is
Square root ( (population size - sample size )/ (population size - one))
It is really used when the sample is a substantial size of the population. Once a sample is only 1% of the population, the correction is negligible.
Bottom line, sometimes sampling without replacement is almost like sampling with replacement.