Ó 2000, S. D. Cochran. All rights reserved.

CORRELATION CONTINUED

A. Handy properties about the Pearson r

1. It is not affected by changes in scale--you can add 1 to every value of x and r will remain the same

2. The correlation between x and y is the same as the correlation between y and x

3. r is a pure number without units

4. Remember r is an estimate of the extent to which the ranks for individual elements are same on each of the two variables.

B. Problems with r related to joint distributions

1. Nonlinear association--the Pearson r is an estimate of linear association. You can have a perfect relationship between x and y, that is if you know x, you know y, and have r of zero

2. Outliers

a. Extreme scores can have undue impact on the value of r

b. But all distributions have points that are on the edge of the joint distribution and it is hard to make decisions about what to do with these points

3. If the joint distribution does not conform to the football shape, the Pearson may not be the appropriate statistic to use to summarize the distribution

C. Problems with r related to using previously summarized scores

1. Remember that r is a measure of spread, just as standard deviation and standard error are measures of spread

2. When r is calculated using rates or averages (both of which summarize points with lots of spread into an estimate of the center of the spread), then the amount of spread in the joint distribution is underestimated. The effect is to inflate the size of r.

3. Ecological studies are a common place where this occurs

D. Problems with using r to imply causality

1. r only says something about the joint distribution of two variables--it provides information on association. It cannot provide any information on causality.

2. Two variables can be associated due to confounding with a third unknown, unmeasured variable

3. Also, we might think one variable causes the other when in fact they are reciprocal in causation or causation is reversed.

Example: Chicken and the egg will only be a portion of the score