Statistics is the science of collecting, presenting, and interpreting
data to answer questions. Thus, there are four primary issues:
a. "Data as they are" questions can be answered:If the U.S. Presidential elections were held today, what percent of Americans would vote for Bill Clinton as president?
b. "What-if questions under replicable circumstances" can be answered:
Among all American school children age 6 to 12, would giving Vitamin C prevent colds? (You can imagine testing this out on more and more children.)
c. Data from nonreplicable events, in general, can NOT be answered!
"How many American Indians would be alive today if the American Revolution had failed?"
a. In SAMPLING, the researcher looks at a part of the group, and makes inferences about the whole group.b. Typically, sampling is part of an OBSERVATIONAL STUDY, in which the researcher collects the data as they currently are.
Ask yourself is this a reasonable guess? It is a bad guess? Ask what sources of data are being used. Is the researcher doing his or her job well?
Los Angeles Times article on family stress, divorce and foreclosures in the Palmdale area of Los Angeles County. The author interviewed 5 families and they shared their commuting stories.If the author is trying to make the case that commuting time causes family stress, anecdotes, though amusing, may not be representative of the larger population of commuting Los Angelenos, or for that matter persons who commute from Palmdale.
Suppose you want to know whether drinking wine can lower your chance of Coronary Heart Disease (CHD). You could survey people, ask them if they drink wine and then measure their cholesterol. There are problems with answering questions in this way instead...
a. The POPULATION is the entire set of people (or animals, things) we wish to study. Examples: All Americans. All parking meters in New York. All blue whales in the sea.
b. A SAMPLE is a part of the population. See the Gallup handout on the Dole vs. Clinton voter poll taken over Labor Day. Why sample? Most times it is not feasible to query the entire population (e.g. too costly, would take too long) so researchers select (sample) a subgroup for questioning or analysis.
c. A numerical fact about a sample is a STATISTIC. It is some number which is used to describe a sample. Example, from the Gallup handout -- 623 Americans surveyed in the telephone poll, 49% said they would vote for Clinton. The 49% is a statistic which describes the sample.
d. A numerical fact about a population is a PARAMETER. Statistic is to sample what PARAMETER is to the population. If 49% is what the telephone survey revealed, the people who conducted the survey hope that it is a close approximation of the true population PARAMETER.
a. Simple random sample (SRS): every person in the population has an equal chance of getting into the sample.b. Not every sampling scheme is simple random sampling; other sampling schemes include STRATIFIED RANDOM SAMPLING and MULTISTAGE CLUSTER SAMPLING.
There is a good example of stratified random sampling on p.187, example 3.5
a. IdeaIf a sample is "representative", then a statistic can be a good estimate of the parameter; but if the sample includes or excludes certain people systematically, the sample is BIASED.
b. Types of bias
( i) Self-selection bias --- certain people decide to talk to you
Read question 3.16 in your text (p. 195). Certain people responded to Shere Hite's survey.
( ii) Selection bias --- you include or exclude certain people
Question 3.16 again, Shere Hite sent out 100,000 surveys to women who belonged to women's groups. Ask yourself, is her sample of women representative?
(iii) Nonresponse bias --- people don't bother to answer you
Still 3.16, only 4.5% of the surveys were returned. It would appear that the respondents were "unusual" in some manner since the vast majority chose to not give their opinions.
( iv) Response bias --- people answer, but they lie to you
In class example, in some surveys of sexual behavior in the United States, men tend to overstate the number of sexual partners they have had since age 18. While women tend to understate the total number of sexual partners. Mathematically, the totals suggest that members of one or the other group -- or both -- are lying.
( v) Wording of question --- people answer based on phrasing
In class example, questions on legal abortion... interviewers receive different responses depending on how the question is worded.
Last Update: 26 September 1996 by VXL