Statistics 50 Lecture 8

Statistics 50
Lecture 8

SAMPLING DISTRIBUTIONS

Basic Definitions
a. The POPULATION is the entire set of people (or animals, things) we wish to study. Examples: All Americans. All parking meters in New York. All blue whales in the sea.
b. A SAMPLE is a part of the population. See the Gallup/CNN tracking poll handout for Oct 1996. Why sample? Most times it is not feasible to query the entire population (e.g. too costly, would take too long) so researchers select (sample) a subgroup for questioning or analysis.
c. A numerical fact about a sample is a STATISTIC. It is some number which is used to describe a sample. Example, the current CNN tracking poll for the upcoming presidential election. As of October 27th, about 49% of the voters polled would vote for Clinton. This 49% is a statistic.
d. A numerical fact about a population is a PARAMETER. Statistic is to sample what PARAMETER is to the population. The idea is this: if 49% is what the survey reveals, those who conducted the survey hope that it is a close approximation of the true population PARAMETER. (That is, the true percentage of voters who will vote for Clinton on November 5th).
Sampling Variability
Think about the Gallup Poll some more. Issues:
a. Calculating a proportion.
b. Using this sample proportion to estimate the unknown population parameter p. How can the voting decisions of 358 people be an accurate estimate of the many millions who will cast a vote?
c. Using repetitions to one's advantage. Simulation is a cheap way to do it. Others might draw repeated samples (like this tracking poll).
d. Definition: The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. It is an ideal pattern.
e. Note: simulations, repetitions are only an approximation. The preferred way to obtain a sampling distribution is to use probability theory.
Properties
a. The mean of the distribution will be close to the true value of the population parameter
b. The distribution is symmetric and approximately normal.
c. The distribution will have a spread (a standard deviation).
d. Note how the size of the sample affects the spread of the statistic. larger samples are more likely to be close to the true value. Also note that the size of the population does not strongly affect the spread of the sampling distribution. This allows a statistician to select 730 likely voters and make some statement about millions of voters.
e. Unbiasedness. A statistic which is used to estimate a parameter is unbiased if the mean of the sampling distribution is equal to the true value of the parameter being estimated.
Very Basic Probability

Sampling Distributions show the behavior of some statistic over a large number of samples. In practice, statisticians cannot gather a large number of samples so they rely on probability theory to tell them what would happen if they actually had many samples to work with and observe.
a. A probability is a number between 0 and 1
b. All possible outcomes must have probability 1
c. The probability that an event does not occur is 1 minus the probability that it does occur.
d. If two events have no outcomes in common, the probability that one or the other occurs is the sum of their individual probabilities.

Return to the Fall 1996 Statistics 50 Home Page

Last Update: 28 October 1996 by VXL