Statistics 50 Lecture 10

Statistics 50
Lecture 10

DISTRIBUTION OF THE SAMPLE MEAN

A. The Sample Mean

Setup
Suppose we draw a simple random sample of size n from a large population. Call the observed values X1, X2, ..., Xn.
An example might be -- draw a simple random sample (SRS) of 100 American women from the population of american women. Measure their heights.
Results for Individual Measurements
Suppose the population has mean of _mu_ and a standard deviation of _sigma_. Then each Xi expected to have a mean = mu and standard deviation = sigma; in other words, each Xi is expected to be like the original distribution.
From the example, a single Xi (height) is a measurement on one unit (woman) selected at random from the population and therefore, the Xi has the probability distribution of the population.
Results for Samples
a. Define X-bar = (X1 + X2 + ... + Xn)/n. X-bar can be thought of as a sample selected at random from all possible samples on the population.
b. It is easily shown that the expected value of x-bar is mu, the average of the population. In other words, the sample means will, on average, be equal to the population mean. x-bar is an unbiased estimator of mu
c. It is also easily shown that the standard deviation of x-bar is sigma/sqrt(n), where sigma is the standard deviation of the population. Thus, the standard deviation of the SAMPLE MEAN will be smaller than the standard deviation for individual measurements: it's easier to predict the average than it is to predict a single measurement.
Example
Consider a population consisting of the elements 1, 2, 3, ..., 997, 998, 999, 1000. Then _mu_ = 500.5 and _sigma_ = 288.82.
Draw a simple random sample of size 5 from the population. One such sample is 164, 582, 850, 892, 433. Then X-bar = 584.2. Note that X-bar is not exactly equal to mu, but it's close.
Draw a new sample: 286, 224, 344, 995, 491. Now X-bar = 468.0. Note that this X-bar is different from the previous X-bar; this is because X-bar is random.
A natural question to ask is how close X-bar will be to mu ... how accurate will our guesses be?

B. Central Limit Theorem

Basic fact
Given a simple random sample of size n from a population having mean mu and standard deviation sigma, the sample mean X-bar will come from a distribution with mean mu and std deviation = sigma/sqrt(n).
Distributional Result
IF the original population had a normal distribution, then the distribution of the sample mean will also be normally distributed.
Example. IQ scores are normally distributed with a mean of 100 and a standard deviation of 16. A sample of 25 persons is drawn. How likely is it to get a sample average of 108 or more? How likely is it for the first score to be 108 or more? (0.6 of 1%, 31%)
Distributional Result
No matter what the distribution of the original population, if the sample size is "large", the distribution of the possible sample means will be close to the normal distribution.
Summary
Take a simple random sample from a population with mean _mu_ and standard deviation _sigma_. Let x-bar be the average of the sample values. If either
(a) the original population is normally distributed, or
(b) the sample size _n_ is sufficiently large,
then x-bar will be normally distributed with expected value _mu_ and standard deviation _sigma_/sqrt(n).
Intuitively, if the histogram for the population follows a normal curve, or if the sample size is large enough each time, then the histogram for the possible values for x-bar will follow a normal curve that has a mean of _mu_ and a standard deviation of _sigma_/sqrt(n). Thus, about 68% of the x-bar's will be within one standard deviation, about 95% of the x-bar's will be within two standard deviations, etc.
Warning
The Central Limit Theorem only applies to the distribution of possible sample averages; it says nothing about the distribution of individual scores in either the sample or the population.
Example
A manufacturer claims his light bulbs last an average of 1200 hours with a standard deviation of 1000 hours. A random sample of 200 light bulbs is drawn and tested. If the manufacturer is correct, how likely is it to get a sample average of 800 hours or less? The s.d. is 1000/sqrt(200) = 70.7 hours, so the chance of getting an average of 800 or less has z = (800-1200)/70.7 = -5.66 ... about 0%.

Return to the Fall 1996 Statistics 50 Home Page

Last Update: 4 November 1996 by VXL