Ó 2004, S. D. Cochran. All rights reserved.

CHI-SQUARE CONTINUED

  1. Chi-square tests are commonly used to evaluate the cross-classification of objects.

Example: In public health, when people fall ill at picnics from eating food that has gone bad, especially food made with mayonnaise like potato salad, epidemiologists often try to look for evidence of an infectious agent by examining cross-classification frequencies. You can imagine the following box:

Illness Status

Food eaten

Got ill

Did not get ill

Ate potato salad

Did not eat potato salad

Each person from the picnic is put in one, and only one cell, depending upon their classification by the two questions: Did they get ill? Did they eat potato salad?

  1. Chi-square tests can be used to evaluate the independence of one variable from another. Here the two variables are illness status and food eaten. Each variable has two levels or categories. And each person can be classified into one of the two levels for each variable--and only one cell of the cross classification.

  2. In picnic example, a chi-square test would evaluate the evidence for an association between eating potato salad and getting ill

  3. Of the two variables above, which is independent? which is dependent? which is causal? which is an outcome? What is the research hypothesis?

  4. If we knew nothing about what went on at the picnic except that 125 people got ill and people who were there think about half of people ate the potato salad, what would we expect the cross-classifications to look like, if potato salad is the culprit? if it is not the culprit?

If potato salad caused all cases of illness:

Got ill

Did not get ill

Ate potato salad

125

75

200

Did not eat potato salad

0

200

200

125

275

400

Under chance (where potato salad is unrelated to illness status):

Got ill

Did not get ill

Ate potato salad

62.5

137.5

200

Did not eat potato salad

62.5

137.5

200

125

275

400

How did we figure the expected frequencies in the cells--

Got ill

Did not get ill

Ate potato salad

(r1*c1)/N

r1

Did not eat potato salad

r2

c1

c2

N

or more precisely, the expected frequency for a cell is calculated by assuming that there is no information in the cell (the two factors are independent) and so the only thing that matters is the counts in the margins (the marginal frequencies). The cell count is weighted to the percent of people in the sample that belong in the cell. The formula is:

Note that the effect of this equation is to put the % of the row margin in the column of interest and the % of the column margin in the row of interest.

  1. We found out from the County epidemiologist who went back and questioned all 400 people that people could be cross-classified as follows:

 

Got ill

Did not get ill

Ate potato salad

103

97

200

Did not eat potato salad

22

178

200

125

275

400

  1. The chi-square calculation then is:

 

fO

fE

fO - fE

(fO - fE)2

(fO - fE)2/fE

Ate-Ill

103

62.5

40.5

1640.25

26.24

Did not-ill

22

62.5

-40.5

1640.25

26.24

Ate-Not ill

97

137.5

-40.5

1640.25

11.93

Did not-not ill

178

137.5

40.5

1640.25

11.93

Sum

400

400

0

 

76.34 = c 2

  1. Now we need to use the chi-square table in the back of the book to figure out the P-value of obtaining a chi-square this large or larger under the condition that the null hypothesis is true.

  1. The degrees of freedom (df) for our analysis in this instance is the number of df in the rows times the number of df in the columns--(rows - 1)*(columns - 1) = 1

  2. With a P-value of 5%, we need to exceed a critical value of chi of 3.84, which clearly we have, so we REJECT the null hypothesis, and conclude that is seems as if eating the potato salad is associated with getting ill. We still cannot say that the potato is the true culprit. It may be that people who ate potato salad also had some other behavior in common that was culprit.

  1. Results from the lottery experiment

Expected distribution:

Gave nothing

Gave some

Gave all

You WON!

You LOST!

The class distribution from the lottery experiment was as follows:

Gave nothing

Gave some

Gave all

You WON!

You LOST!

Chi-square calculations:

 

fO

 

fE

 

fO - fE

 

(fO - fE)2

 

(fO - fE)2/fE

Won

Nothing

Something

All

Lost

Nothing

Something

All

 

c 2 =

Our degrees of freedom = (r - 1)*(c - 1) = (2 - 1)*(3 - 1) = 2

Our critical c 2 = 5.99