MORE ON SAMPLING DISTRIBUTIONS AND ESTIMATION: CENTRAL LIMIT THEOREM
Conceptually, chapter 11.4 attempts to tie together Chapter 11.1-11.3 with a theory
Suppose we draw a simple random sample of size n from a large population. Call the observed values X1, X2, ..., Xn.
An example might be -- draw a simple random sample (SRS) of 10 stocks from some population of stocks (this will be illustrated in your lab #3 --to be handed out later this week). Measure the average percentage change of the 10 and compare it to the true average (that is, the population parameter).
You could repeat this "draw of 10 stocks" from the population with another sample of 10, and another and another. If you draw enough samples (in the handout, I drew 1000 samples) you will start to see a pattern form -- a bell-shaped distribution.
You could define X-bar = (X1 + X2 + ... + Xn)/n.
X-bar can be thought of as a sample selected at random from all possible samples of the population.
The expected value (remember this?) of x-bar is m, the mean of the population. In other words, all of the sample means (in my example, the 1000) should, on average, be equal to the population mean. x-bar should be an unbiased estimator of m
The standard deviation of the distribution x-bar is s/(n) (also known as the standard error), where sigma is the standard deviation of the population.
The standard deviation of the distribution of SAMPLE MEANS (that is, the standard error) will be smaller than the standard deviation for an individual measurement. In other words, it is easier to predict the average of many samples than it is to predict a single measurement (or to predict the average of small samples). What is causing this? Examine the formula for the standard deviation of the sampling distribution, note the effect of sample size on the standard deviation. This was the point of the handout on Friday. We see that as the individual sample size increases -- to 25, to 100, to 250, the spread of all the sample means decreases and we're closer to the population mean (in the handout, the population was size 9000, it's mean was 10 and its standard deviation was 40).
How close is any single X-bar to m ... in other words, how accurate will our individual guesses/samples be? In order to do this, you will need to know the standard deviation of the sampling distribution (the standard error) .
Note how the standard deviation of the sampling distribution changes with sample size. For big samples, the standard deviation for the sample mean will be small and for small samples, the standard deviation is large.
Given a simple random sample of size n from a population having mean m and standard deviation s , a given sample mean X-bar will come from a sampling distribution of all possible X-bars with mean m and standard deviation = s/ n.
If the original population had a normal distribution, then the distribution of the sample mean will also be normally distributed.
Example. IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. A sample of 25 persons is drawn. How likely is it to get a sample average of 108 or more? How likely is it for the first score to be 108 or more? (0.38%, 29.8%)
No matter what the distribution of the original population, if the sample size is "large", the distribution of the possible sample means (not an individual sample mean and not the population, but the distribution of all sample means) will be close to the normal distribution. It is a very powerful theorem and it is the reason why the normal distribution is so well studied.
Take a simple random sample from a population with mean m and standard deviation s . Let x-bar be the average of the sample values. If either
the original population is normally distributed or the sample size n is sufficiently large,
then x-bar will be normally distributed with expected value m and standard deviation s /n.
If the histogram for the population follows a normal curve, or if the sample size is large enough each time, then the histogram for the possible values for x-bar will follow a normal curve that has a mean of m and a standard deviation of s /sqrt(n). Thus, about 68% of the x-bars will be within one standard deviation, about 95% of the x-bars will be within two standard deviations, and 99.7% of the x-bars will be within 3 SD. All of the normal calculations you learned in Chapter 8.6-8.8 apply here.
NOTE AGAIN: The Central Limit Theorem only applies to the distribution of all possible sample averages (i.e. the sampling distribution) it says nothing about the distribution of individual scores in either the sample or the population.
You are interested in buying a business, the owner claims his customers spend an average of $1,200 with a standard deviation of $1000. He allows you to draw a random sample of 150 customers from his database. If the manufacturer is truthful, how likely is it to get a sample average of 950 dollars or less? The standard deviation is 1000/sqrt(150) = 81.6497 dollars, so the chance of getting an average of $950 or less has z = (950-1200)/81.6497 = -3.06... about 0.11% or like something like 1 in a 1000 samples. Is this frequent or rare? Ask yourself -- if I'm expecting around $1,200 and I only get $950, can I be this unlucky…or is there another explanation (perhaps the owner's claim is just wrong)
Chapter 11.4 #1, #2, #3, #4