lect1013

Statistics 10
Lecture 13

MORE ABOUT CHANCE ERRORS IN SAMPLING

A. Sample Size and Standard Error

a. Idea

If we increase the size of the sample (assuming it is representative) we get a better "fix" on the PARAMETER. In Figure 2 (p. 358) we draw 250 samples of size 400 and now the range is 39% men to a high of 54%.

b. Equation

percentage in the population = percentage in a sample + chance error

The expected value = percentage in the population, the sample percentage will be off by chance error.

As long as you have a sample and not the population, you are almost certain to run into chance error.

c. Chance Error and the Standard Error

How big is the chance error? The STANDARD ERROR tells you this.

Standard Error = Square Root of Sample Size x Standard Deviation of "the box".

Example 1: for a sample of 100, the standard error is
Square Root 100 x Square Root ( .46 x .54) =
10 x .5 = 5
SE for a percentage = (SE for a number / size of sample) x 100 = 5%

Example 2: for a sample of 400, the standard error is
Square Root 400 x Square Root ( .46 x .54) =
20 x .5 = 10
SE for a percentage = (SE for a number / size of sample) x 100 = 2.5%

Note the relationship between the SE for a number and the SE for for a percentage. As the sample size increases, the SE for a number increases (look at the formula) but the SE for a percentage decreases.

Example 3: Problem 2, Exercise Set A

25,000 students, 10,000 are older than 25.
Find the expected value= number of "draws" x average of a box
160 = 400 x .4
it's like a box with 10,000 1's and 15,000 0's
Find the standard error of the number of students in the sample
standard error of the number = square root of sample size x SD of box = 20 x .5 = 10.
Find the standard error of the percentage of students
standard error of the percentage = (10 / 400) x 100 = 2.5%

The percentage of students in the sample who are older than 25 will be around 40% give or take 2.5%

B. Interpretation and the Normal Curve Again (20.3)

How do we work with:

"The percentage of students in the sample who are older than 25 will be around 40% give or take 2.5%"

We can convert these to standard units (Z scores) as in Chapter 5.

One standard error in this example is 2.5%, +1 standard error would be 40 +2.5 or 42.5%, -1 standard error will be 37.5%.

The chance that between 37.5% and 42.5% of any given sample of 400 students will be older than 25 is about 68%.

We can move from using the normal curve to figuring percentages to using the normal curve to make statements about chances.

The chance that between 35% and 45% of any given sample of 400 students will be older than 25 is about 95%

And 99%?

C. Sample Size and Standard Error (summarized)

Idea

If we increase the size of the sample (assuming it is representative) we get a better "fix" on the PARAMETER.

Equation

percentage in the population = percentage in a sample + chance error

The expected value = percentage in the population

The sample percentage will be off by chance error.

As long as you have a sample and not the population, you are almost certain to run into chance error.

Chance Error and the Standard Error

How big is the chance error? The STANDARD ERROR gives you an idea of the size of the chance error.

Standard Error = Square Root of the number of draws (i.e. sample size) x Standard Deviation of "the box" (i.e. the population).

Examples

A college has 25,000 students, 10,000 are over age 25, 15,000 are under age 25.

A sample of 100 students is drawn, what is the expected value and SE for the number of students in the sample who are over 25?

Expected value for a number = (# of draws from the box) x (average of the box)

40 = 100 x ((10,000 x 1 + 15,000 x 0)/25000)

SE for a number = Square Root of # draws x SD of the box

5 = 10 x .5

A sample of 900 students is drawn, what is the expected value and SE for the number of students in the sample who are over 25?

360 = 900 x ((10,000 x 1 + 15,000 x 0)/25000)

15 = 30 x .5

A sample of 3600 students is drawn, what is the expected value and SE for the number of students in the sample who are over 25?

1440 = 3600 x ((10,000 x 1 + 15,000 x 0)/25000)

30 = 60 x .5

Things to note

(1) the parameter(s) stay fixed. The box always has an average of .40 and the Standard Deviation of the box is always .50.

(2) the standard error for the number (or count) of students gets larger as the sample gets larger

(3) as a percentage of the sample, it gets smaller, i.e. 5/100 = 5%, 15/900 = 1.7%, 30 / 3600 = 0.8%

Sample Size	Expected number	SE of the number	Expected percentage	SE of the percentage
100	40	5	40%	5%
900	360	15	40%	1.7%
3600	1440	30	40%	0.8%

D. Interpretation and the Normal Curve Again (20.3)

How do we work with:

"The percentage of students in a sample of 100 who are older than 25 will be around 40% give or take 5%"

One standard error in this example is 5%, +1 standard error would be 40% +5% or 45%, -1 standard error will be 35%.

The chance that between 35% and 45% of any given sample of 100 students will be older than 25 is about 68%.

We can move from using the normal curve to figuring percentages to using the normal curve to make statements about chances.

The chance that between 30% and 50% of any given sample of 100 students will be older than 25 is about 95%

And 99%? 25% to 55%

What's going on here. As was suggested in Chapter 20.1, if we could sample infinitely, the sample percentages would bunch around the true value and have the appearance of a normal distribution.

We can borrow from that property and use the normal curve to make statements about chances of getting samples with particular characteristics. We can convert percentages (or any other kind of statistic) to Z scores.

Some differences: you are now working with SE instead of SD. And a parameter -- which can be a percentage, an average, a sum -- which is fixed. And a sample statistic.

Example: for the sample of 100 students, we can say that the chance of getting between 35 and 45 students over age 25 is 68%, between 30 and 50 students is 95% and so forth.

Example: for the sample of 100 students, what is the chance of getting between 37 and 43 students over age 25?

Z = (43 - 40) / 5 = 3/5 = .60

A Z = .60 has 45.15% between +/- Z which could be interpreted as you have a 45.15% chance of getting between 37 and 43 students

E. Correcting for Sampling without Replacement (OPTIONAL)

Recall that a Simple Random Sample (SRS) is sampling without replacement.

To make a long story short, when populations are large, it doesn't really matter that one is sampling without replacement. But sample size does matter and it does affect the accuracy of the estimate.

Still, there is a correction factor (OPTIONAL) and it is

Square root ( (population size - sample size )/ (population size - one))

It is really used when the sample is a substantial size of the population. Once a sample is only 1% of the population, the correction is negligible.

Bottom line, sometimes sampling without replacement is almost like sampling with replacement.