Making generalizations from the data and drawing conclusions.
a. The POPULATION is the entire set of people (or animals, things) we wish to study. Examples: All Americans. All parking meters in New York. All blue whales in the sea. All likely voters on November 7th, 2000
b. A SAMPLE is a part of the population. See the CNN Polls handout. Why sample? Most times it is not feasible to query the entire population (e.g. too costly, would take too long) so researchers select (sample) a subgroup for questioning or analysis.
c. A numerical fact about a sample is a STATISTIC. It is some number which is used to describe a sample. Example, from the handout -- of the 432 likely New Hampshire voters surveyed in the telephone poll, 48% said they would vote for Bradley instead of Gore. The 48% is a statistic which describes the sample.
d. A numerical fact about a population is a PARAMETER. Statistic is to sample what PARAMETER is to the population. If 48% is what the telephone survey revealed, the people who conducted the survey hope that it is a close approximation of the true population PARAMETER.
Bias
a. Idea
If a sample is "representative", then a statistic can be a good estimate of the parameter; but if the sample includes or excludes certain people systematically, the sample is BIASED. See the CNN instant polls and Yahoo Polls for examples of non-random samples.
b. Types of bias
An example: In the 1980s, a behavioral researcher named Shere Hite sent out 100,000 questionnaires to explore how women viewed their relationships with men. She amassed a huge collection of anonymous letters from thousands of women disillusioned with love and marriage. Recognizing that the response rate was very small (4.5% -- typically, a well done survey has response rates 70%), she defended the sample as representative because "those participating according to age, occupation, religion, and other variables known for the U.S. population at large in most cases quite closely mirrors that of the U.S. female population." Nevertheless, the results from this sample differ dramatically from those in more scientific polls on the same topic with better response rates.
Some of her findings: 91% of the women who were divorced revealed that they had initiated their divorces. 70% of the women surveyed admitted to committing adultery.
( i) Self-selection bias --- certain people decide to talk to you
Certain people responded to Shere Hite's survey.
( ii) Selection bias --- you include or exclude certain people
Shere Hite sent out 100,000 surveys to women who belonged to women's groups. Ask yourself, is her sample of women representative?
(iii) Nonresponse bias --- people don't bother to answer you
Only 4.5% of the surveys were returned, it would appear that the respondents were "unusual" in some manner since the vast majority chose to not give their opinions.
( iv) Response bias --- people answer, but they lie to you
In class example, in some surveys of sexual behavior in the United States, men tend to overstate the number of sexual partners they have had since age 18 (average 12). While women tend to understate the total number of sexual partners (average 3). Mathematically, the totals suggest that members of one or the other group -- or both -- are lying.
( v) Wording of question --- phrasing may not be neutral (e.g. a loaded question).
In class example, questions on legal abortion. There are different levels of support depending on how the question is worded. From a 1992 series of Time/CNN polls:
Question |
In Favor |
Opposed |
Unsure |
"When the mother's life is at stake:" |
84 |
11 |
5 |
"When the mother's health is in danger:" |
82 |
12 |
6 |
"In cases of rape or incest:" |
79 |
16 |
5 |
"If the fetus will be born seriously deformed:" |
70 |
22 |
8 |
"For any reason during the first trimester:" |
47 |
44 |
9 |
"For any reason while the fetus cannot survive outside the womb:" |
67 |
23 |
10 |
"At no time during the pregnancy:" |
25 |
65 |
10 |
Statisticians are well aware of the problem of bias. Only in the last 50 years have survey organizations used probability methods to draw their samples.
Sampling designs
a. Simple random sample (SRS): every person in the population has an equal chance of getting into the sample with each draw. In practice this is drawing at random without replacement (because it would not make sense to select the same person or measure the same animal/thing twice).
b. Not every sampling scheme is simple random sampling; other sampling schemes include MULTISTAGE CLUSTER SAMPLING.
There is a good example of multistage cluster sampling on p.341, Figure 1
The idea here is that a large population (e.g. the US) is broken down into increasingly smaller areas at and each stage a single unit is drawn randomly until the unit of interest (e.g. households) is reached.
Note: these methods can be applied to things other than households. Examples might be estimating the corn harvest, sampling firms on hiring expectations, etc.
Probability methods work well because they are impartial.