AN EXAMPLE USING VERTICAL STRIPS

An example using vertical strips

Suppose at an elite college students had an average IQ of 115 (S.D. = 15) and an average G.P.A. of 3.2 (S.D. = 0.5). It’s a tough school. And also suppose that the correlation between IQ and GPA is 0.4 (Why so low….well, a lot of GPA is dependent on effort, as well as talent, and also in an environment where everyone is smarter than average it tends to decrease the size of the correlation between the two indices---ask yourself "why?")

What percent of these students would you expect to have an IQ of 130 or more?
What percent of those with a G.P.A of 3.5 do you expect to have an IQ of 130 or more?
Why are these two answers different?
What percent of those with a G.P.A of 4.0 have an IQ of 120 or less?
Ralph is a student at this college. What do you expect his G.P.A. to be?
Ralph has an I.Q. of 130. What do you expect his G.P.A. to be?
Why are these two answers different? Which one is better (more likely to be accurate)?

Work out the answers before you peek below.

What percent of these students would you expect to have an IQ of 130 or more?

An I.Q. of 130 is 1 standard deviation above the average for the students in this college. Using the Table of the normal curve, a standard deviation of 1 cuts off 68.27% of the center of the distribution. The right tail above 1 standard deviation (which contains those with an IQ of 130 or more) is then 100% - (50% + 68.27%/2) = 15.86%. So we should expect about 15.86% of students to have an IQ of 130 or more.

Notice that this question does not use the information about GPA.

What percent of those with a G.P.A of 3.7 do you expect to have an IQ of 130 or more?

A G.P.A. of 3.7 is 1 S.D. higher than average. These individuals should also have higher I.Q.’s because the correlation between I.Q. and G.P.A. is positive.

We know that r*SD_y/SD_x = 1 (see p. 152 in your text). So r*SD_y = SD_x (it’s simple math here, but you should do it so that you understand where the formula comes from).

So in this case, we would expect that those with a GPA 1 S.D. higher than average should also have a IQ SD that is: 0.4*1 = .4 SD higher than average.

That expected I.Q. would be the mean IQ for the whole student body plus 0.4 S.D. or 115 + 0.4 (15) = 121. So we would predict that the average I.Q. for people with a GPA of 3.7 is 121.

But there is spread around this point. Not everyone with a 3.7 GPA has the same I.Q. The spread in this vertical strip is not the SD for all IQ’s but rather the RMS error, or the standard error of the estimate (the estimate is literally the point on the regression line that we are now using as the mean for the vertical strip). Try drawing things here to make clear in your head what’s going on.

We calculate the RMS by the formula SD_y * (squareroot of 1 – the correlation squared). Here it would be 15 * squareroot (1 – 0.4^2) = 13.75. The trick here is to have clear in your head what is y (the values we are predicting, in this instance IQ) and what is x (the values we are using to reduce our uncertainty about value on the other variable, in this instance GPA). Notice also that the spread here is smaller than the spread in the total student body. Why? Notice also that I am quietly assuming that the bivariate distribution is homoscedastic, or that the RMS error is a good estimate of spread across the range of the X variable. What should this normal bivariate distribution look like if I were to draw it? What would the distribution of my residuals look like? (see p. 176 in your text)

So, an IQ of 130, when the mean of the group is 121 " 13.75 is: (130 – 121)/13.75 = 0.65 SD above the average for those who have a GPA of 3.7. An SD of 0.65 cuts off 48.43% of the center of the normal distribution, so we would predict that among those with a GPA of 3.7, approximately 100% - (50% + 48.43%/2) = 25.78% have an IQ of 130 or more.

Why are these two answers different?

In the first instance we used the whole student body to estimate a percentage. In the second we used our knowledge on one variable to reduce our uncertainty about another. Those with higher GPA’s should have slightly higher IQ’s on average, so we would expect that there should be a greater percentage of high IQ people among those with GPA’s that are above average.

Now, stop peeking. Go back and try the next few questions on your own. It’s the only way to grow neurons.

What percent of those with a G.P.A of 4.0 have an IQ of 120 or less?

A G.P.A of 4.0 is (4 – 3.2)/.5 = 1.6 SD’s above average. We would expect that these students would have IQ’s that are 0.4 * 1.6 = 0.64 SD’s above average. So the expected mean IQ for this group should be 115 + .64 (15) = 124.6. The spread around this estimate should be around 15 * sqrt (1 – 0.4^2) = 13.75. Those with an IQ of 120 or less are (120 – 124.6)/13.75 = -.33 SD below the expected mean of those with a GPA of 4.0. Looking in the normal table, this cuts off about 27.37 percent of the distribution, so we would expect that about 50% - 27.37%/2 = 36.31% of people with a GPA of 4.0 have an IQ of 120 or less.

Ralph is a student at this college. What do you expect his G.P.A. to be?

In the absence of any information about Ralph, I expect his GPA to be 3.2 " 0.5. I’ll be right about 68% of the time by stating this range.

Ralph has an I.Q. of 130. What do you expect his G.P.A. to be?

Well, now, here is some important information about Ralph. We can use our knowledge of the relationship between GPA and IQ to come up with a better estimate. The regression line relating IQ to GPA is estimated GPA = slope * the person’s IQ + error. The general line linking the two variables is GPA = (r*SD_gpa)/SD_iq + a. The slope is b = (0.4)*(0.5)/15 = 0.013. The intercept is a = mean of GPA – b * mean of IQ, for reasons that are in the Overheads book and also on p. 192 of your text. So in this instance it would be a = 3.2 – (0.013*115) = 1.70.

So for the case of Ralph, if his IQ is 130, we expect his GPA to be: 0.013*130 + 1.70 = 3.39. Now there is, as always, some uncertainty here about this prediction, but that is not covered in this course.

7. Why are these two answers different? Which one is better (more likely to be accurate)?

The two answers (3.2 vs. 3.39) differ because in the first instance we only knew that Ralph was an element (a student) in the student body and there we used the average of the group to predict his GPA. In second instance we had two pieces of information about him, that he was a member of the student body and that he had specific IQ. Using these two pieces of information (and the other details about the GPA and IQ distributions and covariances) we could come up with a better prediction of Ralph’s GPA. The second one is more likely to be closer to the truth.