Ó 2004, S. D. Cochran. All rights reserved.

CORRELATION CONTINUED

  1. Handy properties about the Pearson r

    1. It is not affected by changes in scale--you can add 1 to every value of x and r will remain the same

    2. The correlation between x and y is the same as the correlation between y and x

    3. r is a pure number without units

    4. Remember r is an estimate of the extent to which the ranks for individual elements are same on each of the two variables.

  2. Problems with r related to joint distributions

    1. Nonlinear association--the Pearson r is an estimate of linear association. You can have a perfect relationship between x and y, that is if you know x, you know y, and have r of zero

    2. Outliers

      1. Extreme scores can have undue impact on the value of r

      2. But all distributions have points that are on the edge of the joint distribution and it is hard to make decisions about what to do with these points

    3. If the joint distribution does not conform to the football shape, the Pearson may not be the appropriate statistic to use to summarize the distribution

    4. Restrictions of range—if we are interested in relationships between two variables, over the range of the two variables there may be a strong relationship but in a truncated range this can disappear

    1. Example--Among all humans there is a strong association between age and height

    2. But among those age 21.1 to 21.3, the strong association is difficult to detect

  1. Problems with r related to using previously summarized scores

    1. Remember that r is a measure of spread, just as standard deviation and standard error are measures of spread

    2. When r is calculated using rates or averages (both of which summarize points with lots of spread into an estimate of the center of the spread), then the amount of spread in the joint distribution is underestimated. The effect is to inflate the size of r

    3. Ecological studies are a common place where this occurs

  2. Problems with using r to imply causality

    1. r only says something about the joint distribution of two variables--it provides information on association. It cannot provide any information on causality

    2. Two variables can be associated due to confounding with a third unknown, unmeasured variable

    3. Also, we might think one variable causes the other when in fact they are reciprocal in causation or causation is reversed

    4. Example: Chicken and the egg

REVIEW EXAMPLES

  1. If you flip a fair coin 100 times, and then repeat this many, many times, what percent of the time would you expect to see heads come up only 38 times or less in a batch of 100 coin flips?

  2. The average UCLA student spends about $XXX.XX per quarter on textbooks, plus or minus about $XX.XX.

    1. How much would you expect that you will spend on textbooks over four quarters?

    2. Predict how much you could expect to spend about 68% of the time in any random collection of four quarters of your career here at UCLA.  It would be somewhere between ______ and  ______

    3. If you drew a random sample of 100 UCLA students, how much do you think they each individually spent on textbooks during Fall Quarter on average?

    4. If you wanted to be 95% confident, what would you estimate is the average amount of money spent on textbooks Fall Quarter by these 100 students? It would be somewhere between ______ and  ______

  3. Given the class distribution of Valentine's Day dating, predict the percent of UCLA students who will go out with their honey on Valentine's Day. 

    1. What would be a 95% Confidence Interval surrounding this estimate?

    2. What is a 95% Confidence Interval?