Ó
2004, S. D. Cochran. All rights reserved.CHI-SQUARE CONTINUED
Chi-square tests are commonly used to evaluate the cross-classification of objects.
Example: In public health, when people fall ill at picnics from eating food that has gone bad, especially food made with mayonnaise like potato salad, epidemiologists often try to look for evidence of an infectious agent by examining cross-classification frequencies. You can imagine the following box:
Illness Status
Food eaten |
Got ill |
Did not get ill |
Ate potato salad |
||
Did not eat potato salad |
Each person from the picnic is put in one, and only one cell, depending upon their classification by the two questions: Did they get ill? Did they eat potato salad?
Chi-square tests can be used to evaluate the independence of one variable from another. Here the two variables are illness status and food eaten. Each variable has two levels or categories. And each person can be classified into one of the two levels for each variable--and only one cell of the cross classification.
In picnic example, a chi-square test would evaluate the evidence for an association between eating potato salad and getting ill
Of the two variables above, which is independent? which is dependent? which is causal? which is an outcome? What is the research hypothesis?
If we knew nothing about what went on at the picnic except that 125 people got ill and people who were there think about half of people ate the potato salad, what would we expect the cross-classifications to look like, if potato salad is the culprit? if it is not the culprit?
If potato salad caused all cases of illness:
Got ill |
Did not get ill |
||
Ate potato salad |
125 |
75 |
200 |
Did not eat potato salad |
0 |
200 |
200 |
125 |
275 |
400 |
Under chance (where potato salad is unrelated to illness status):
Got ill |
Did not get ill |
||
Ate potato salad |
62.5 |
137.5 |
200 |
Did not eat potato salad |
62.5 |
137.5 |
200 |
125 |
275 |
400 |
How did we figure the expected frequencies in the cells--
Got ill |
Did not get ill |
||
Ate potato salad |
(r1*c1)/N |
r1 |
|
Did not eat potato salad |
r2 |
||
c1 |
c2 |
N |
or more precisely, the expected frequency for a cell is calculated by assuming that there is no information in the cell (the two factors are independent) and so the only thing that matters is the counts in the margins (the marginal frequencies). The cell count is weighted to the percent of people in the sample that belong in the cell. The formula is:
Note that the effect of this equation is to put the % of the row margin in the column of interest and the % of the column margin in the row of interest.
We found out from the County epidemiologist who went back and questioned all 400 people that people could be cross-classified as follows:
Got ill |
Did not get ill |
||
Ate potato salad |
103 |
97 |
200 |
Did not eat potato salad |
22 |
178 |
200 |
125 |
275 |
400 |
The chi-square calculation then is:
fO
fE
fO - fE
(fO - fE)2
(fO - fE)2/fE
Ate-Ill
103
62.5
40.5
1640.25
26.24
Did not-ill
22
62.5
-40.5
1640.25
26.24
Ate-Not ill
97
137.5
-40.5
1640.25
11.93
Did not-not ill
178
137.5
40.5
1640.25
11.93
Sum
400
400
0
76.34 =
c 2
Now we need to use the chi-square table in the back of the book to figure out the P-value of obtaining a chi-square this large or larger under the condition that the null hypothesis is true.
The degrees of freedom (df) for our analysis in this instance is the number of df in the rows times the number of df in the columns--(rows - 1)*(columns - 1) = 1
With a P-value of 5%, we need to exceed a critical value of chi of 3.84, which clearly we have, so we REJECT the null hypothesis, and conclude that is seems as if eating the potato salad is associated with getting ill. We still cannot say that the potato is the true culprit. It may be that people who ate potato salad also had some other behavior in common that was culprit.
Results from the lottery experiment
Expected distribution:
Gave nothing
Gave some
Gave all
You WON!
You LOST!
The class distribution from the lottery experiment was as follows:
Gave nothing
Gave some
Gave all
You WON!
You LOST!
Chi-square calculations:
fO
fE
fO - fE
(fO - fE)2
(fO - fE)2/fE
Won
Nothing
Something
All
Lost
Nothing
Something
All
c 2 =
Our degrees of freedom = (r - 1)*(c - 1) = (2 - 1)*(3 - 1) = 2
Our critical
c 2 = 5.99