1.      Finishing 10.3 (from last time) USING THE Z SCORE AND REGRESSION INFORMATION

 

Recall that we used the correlation and a Z score (that is the number of standard deviations the x-variable was away from its mean) to predict the y-variable.  Now we can do something similar to material from Chapter 5.

 

Suppose a city is in the highest 10% of the poverty rate (so it is one of the poorest, do you recall what "percentile" means?) what is its predicted murder rate?

 

If you are in the top 10% of ANYTHING, this means you have a Z-score of +1.3 or you are 1.3 standard deviations above average.

 

If a city is in the top 10% of poverty, it's poverty rate can be calculated…using the information from last time:

 

14.016 + (1.3 * 4.287) = 19.589

 

where 14.016 was the average percentage poverty rate for all cities and 4.287 is the standard deviation.  So its poverty rate is 19.589.  The corresponding murder rate is:

 

1.3   * .63 = .819

 

this is the Z score multiplied by the correlation, this gives you the number of standard deviations the y-variable is away from its average.

 

Then multiply the result by the actual standard deviation of the y-variable:

 

.819 * 3.984 = 3.263

 

Finally, add this value (which is the fraction of a SD) to the average for the y-variable:

 

7.332 + 3.263 = 10.595

 

So if your city is in the top 10% of poverty, the predicted murder rate for your city is 10.595

 

2.      USING THE SLOPE AND THE INTERCEPT OR A LINE TO PREDICT

 

Previously in Chapter 10.3, we used the correlation, means, and standard deviations for prediction.  In 12.1, we will now use a line to predict the value of the y-variable from an x-variable.  This method is somewhat easier than the ones described in 10.3:

A. Idea: given a set of data, we might try to find the line that best summarizes the relationship between X and Y. This line will tell us how much Y changes with a change in X. Note that regression requires us to have explanatory and response variables.

B. Math fact: straight lines can be represented in the form

y = slope*x + intercept  (see top of page 205)

The slope tells how much y tends to be different when x changes by one unit; the intercept tells what we expect to get for y when x=0.

The method of drawing a line through a scattering of points tries to make a line that is as close as possible to the points in the VERTICAL direction. This line is called a "least squares line" or "the method of ordinary least squares."

C.  Formula: the coefficients for the least squares line (using X to predict Y) are

 
               slope = r * SD of y
                       -----------
                          SD of x

intercept = average of y's - (slope) * (average of x's)

* Note: you must solve for the slope first to calculate intercept using these formulas.

 

3.         Example

Using our data on cities again the average murder rate was 7.332 and the standard deviation was 3.984, the average poverty rate was 14.016% and the standard deviation was 4.287%.  Their correlation is .63.

 

The regression equation for predicting murder rate from poverty rate is:

 

First calculate the slope

 

 

            0.63 * 3.984

slope =  -------------  =    .5855

               4.287

 

then the intercept

 

intercept = 7.332 - (.5855)*14.016 = -0.874

 

And put it all together

 

predicted murder rate = (.5855) x poverty rate + (-0.874)

 

Predict the murder rate of a city with a poverty rate of 12%.

predicted murder rate = .5855 x 12 + (-0.874) = 6.15

 

Predict the murder rate of a city with 0 poverty rate.

predicted murder rate  = .5855 x 0 + (-0.874) = -0.874

 

Beware of extrapolation: predicting outside of the data

Predict the murder rate of a city with a poverty rate of 50; is this reasonable? (28.40, no, the sample of cities had a range of poverty from 8% to 26.4%)

(Hint: Check the scatter diagram or the data to make sure to make sure you stay within the range)

 

Beware of trying to predict the x variable.  You can't do this, you would need a new regression line to predict the x variable from information about the y variable.