Stat 10: Making the leap from Chapters 16.4, 17.1-17.3, 20.1 - 20.4 to Chapter 21.
Suppose we're supreme beings and we know exactly what percentage of Californians over age 18 will buy their next car via the Internet. Let's just say that percentage is 10%.
This could be represented by a "box" of tickets representing the whole population of Californians, 18 and over, let's say there are 20 million of them, and 10% (or 2 million) have a "1" representing "plans to buy via the internet" and the remainder, 90% (or 18 million) "do not plan to buy via the internet" could be represented with a "0"
First, some quick review about chances again (see Chapter 13's summary):
In Chapter 16.4, 17.1 - 17.3, 20.1 - 20.4 this is one of the main ideas. Questions of interest -- in this case, what % of people plan to buy a car via the internet (instead of more traditional ways) or what % of people plan to vote for Gore or what % people think the world is going to end, etc. etc. can be simplified into a "box model". The type of box model illustrated here has two outcomes (binary, "one-zero").
The type of questions (that you would see on a midterm) that grow out of this kind of
situation run along the lines of:
"What is the CHANCE (or what percentage of time) of drawing or selecting or surveying 100 people and discovering that 20% (or 20 people) said they were planning to buy a car via the internet?"
The ideas floating around are:
The ability to do this comes from Chapter 17.1 - 17.3 and the discussion in 20.1 -- the most difficult concept to understand is this: sample outcomes (statistics, in this case, a percentage) have a distribution too, this distribution is normal (remember Chapter 5), it's center (or mean) is the parameter p which is the population percentage (or proportion). This distribution also has a spread (called the Standard Error) and for a percentage it is the formula on page 360:
SE for percentage = (SE for number / size of sample ) * 100
Remember the formula for a SE for a number (p. 291):
SE for a number = (square root of the number of draws) x (standard deviation of the box)
What is the standard deviation for this "box"? (1 - 0) * Ö .1 * .9 = .30
So the SE for the number of "1"s drawn would be 10 * .30 = 3.
The SE for the percentage of "1"s draw would be (3 / 100) * 100 = 3%
Z = 20% - 10% / 3% = 3.33 (round down to a Z of 3.30) = 99.903%
Interpretation -- 99.903% of the sample outcomes (sample statistics) should fall between
-3.30 and +3.30. Only .0485% of all samples outcomes are expected to be as large as 20% or larger. Or you could say THERE WAS ONLY A .0485% CHANCE OF GETTING AN OUTCOME AS LARGE AS 20% IF WE WERE EXPECTING 10%
This was an example using percentages. Sometimes we are interested in SUMS (see 16.4, 17.1 - 17.3)
Chapter 21 is a departure from Chapters 17.1 - 17.3 and 20.1 - 20.4, but it more like "real life" in terms of statistics.
Let's continue the car purchase idea…
An enterprising auto dealer (who is not a supreme being) wants to know what percentage of Californians over the age of 18 plan to buy their next car via the Internet. If it turns out there are at least 10% willing to buy, he'll set up a web site. But if it's less than 10%, he doesn't think it's worth his trouble. Suppose the auto dealer selected (at random) and surveyed 100 Californians age 18 and over.
Suppose the auto dealer's sample of 100 yielded the following results:
8% said they would buy their next car via the Internet
92% said they would not buy their next car via the Internet
Never having taken statistics and not being a supreme being, the auto dealer might interpret these results in the following manner "it looks like 8% of Californians plan to buy their next car via the Internet. That's too low for me, I won't have a web site created"
So…the "new" population percentage is 8%,
the new standard deviation for this "box" is
(1 - 0) * Ö .08 * .92 = .2713
So the SE for the number of "1"s drawn would be 10 * .2713 = 2.713.
The SE for the percentage of "1"s draw would be (2.713 / 100) * 100 = 2.713%
We might calculate Z scores at this point, but what is done in Chapter 21 (specifically 21.2) is an introduction to the CONFIDENCE INTERVAL. You can think of it as a "margin of error". Here, a range of values derived from sample information is given and within this range, we think the true parameter is "covered". We use a combination of the sample estimate of 8% and the Standard error for that percentage (2.713%) to make statements of confidence about where we think the true parameter is.
We know from Chapters 17 and 20 than in 68% of all samples, the sample percentage should be within one standard error of the population percentage. So in this Chapter 21 example:
8% plus or minus 2.713% is a 68% confidence interval. We would say that we are 68% confident that the interval 5.2871% to 10.713% covers the "truth", that is, the unknown population parameter p.
8% plus or minus (2 * 2.713%) is approximately a 95% confidence interval. We would say that we are 95% confident that the interval 2.574% to 13.426% covers the "truth", that is, the unknown population parameter p.
8% plus or minus (3 * 2.713) is approximately a 99.7% confidence interval. We would say that we are 99.7% confident that the interval -0.1390% to 16.139% covers the "truth", that is, the unknown population parameter p.
We can never be 100% confident. There is always the chance, remote as it may be, that you did everything correctly when drawing a sample and still got a bad sample.
Notes
If the original population is normally distributed with a known standard deviation, or if the sample size is "large", then the distribution of the sample statistic (in this example a percentage -- but it could be a sum or an average) is normal, and the appropriate test statistic is thus z from the normal table. (If the original distribution is normal with an unknown standard deviation, the test statistic is different.)
Your margin of error will depend on the choice of a confidence level. A lower confidence will give you a smaller margin of error. A higher confidence will give you a larger margin of error.
If your standard deviation is small, it is easier to get a more precise fix on the parameter. Your margin of error is smaller for populations with smaller standard errors.
If your n (sample size) increases in size, it will reduce your margin of error. If your n (sample size) gets smaller, it will increase your margin of error.
The parameter is FIXED, UNCHANGING but your confidence intervals will vary from sample to sample because your statistic varies.