Statistics 10

1. Basic Definitions

The POPULATION is the entire set of people (or animals, things) we wish to study.

A SAMPLE is a part of the population.

A numerical fact about a sample is a STATISTIC.

A numerical fact about a population is a PARAMETER.

Example, from the handout -- of the 3,200 adults surveyed as part of a national sample, 4% said they have considered killing themselves. The 4% is a statistic which describes the sample. Statistic is to sample what PARAMETER is to the population. If 4% is what the survey revealed, the people who conducted the survey hope that it is a close approximation of the true population PARAMETER.

2. Problems

A. Bias -- If a sample is "representative", then a statistic can be a good estimate of the parameter; but if the sample includes or excludes certain people systematically, the sample is BIASED. See examples of non-random samples…results from Vote.com

B. Selection bias --- you include or exclude certain people

C. Nonresponse bias --- people don't bother to answer you

D. Response bias --- people answer, but they lie to you or they are manipulated by the way you asked the question

E. Wording of question --- phrasing may not be neutral (e.g. a loaded question).

3. Design Issues

Statisticians are well aware of the problem of bias. Only in the last 50 years have survey organizations used probability methods to draw their samples. These Sampling Designs can help.

a. Simple random sample (SRS): every person in the population has an equal chance of getting into the sample with each draw. In practice this is drawing at random without replacement (because it would not make sense to select the same person or measure the same animal/thing twice).

b. Not every sampling scheme is simple random sampling; other sampling schemes include MULTISTAGE CLUSTER SAMPLING.

There is a good example of multistage cluster sampling on p.341, Figure 1

The idea here is that a large population (e.g. the US) is broken down into increasingly smaller areas at and each stage a single unit is drawn randomly until the unit of interest (e.g. households) is reached.

Note: these methods can be applied to things other than households. Examples might be estimating the corn harvest, sampling firms on hiring expectations, etc.

PRINCIPLE: Probability methods work well because they are impartial.