1.
Finishing 10.3 (from last time) USING THE Z SCORE AND
REGRESSION INFORMATION
Recall
that we used the correlation and a Z score (that is the number of standard
deviations the x-variable was away from its mean) to predict the
y-variable. Now we can do something
similar to material from Chapter 5.
Suppose
a city is in the highest 10% of the poverty rate (so it is one of the poorest,
do you recall what "percentile" means?) what is its predicted murder
rate?
If
you are in the top 10% of ANYTHING, this means you have a Z-score of +1.3 or
you are 1.3 standard deviations above average.
If
a city is in the top 10% of poverty, it's poverty rate can be calculated…using
the information from last time:
14.016
+ (1.3 * 4.287) = 19.589
where
14.016 was the average percentage poverty rate for all cities and 4.287 is the
standard deviation. So its poverty rate
is 19.589. The corresponding murder
rate is:
1.3
*
.63 = .819
this
is the Z score multiplied by the correlation, this gives you the number of
standard deviations the y-variable is away from its average.
Then
multiply the result by the actual standard deviation of the y-variable:
.819
* 3.984 = 3.263
Finally,
add this value (which is the fraction of a SD) to the average for the
y-variable:
7.332
+ 3.263 = 10.595
So
if your city is in the top 10% of poverty, the predicted murder rate for your
city is 10.595
2.
USING THE SLOPE AND THE INTERCEPT OR A LINE TO PREDICT
Previously
in Chapter 10.3, we used the correlation, means, and standard deviations for
prediction. In 12.1, we will now use a
line to predict the value of the y-variable from an x-variable. This method is somewhat easier than the ones
described in 10.3:
A. Idea: given a set of data, we might try to find the line that best summarizes the relationship between X and Y. This line will tell us how much Y changes with a change in X. Note that regression requires us to have explanatory and response variables.
B. Math fact: straight lines can be represented in the form
y = slope*x + intercept (see top of page 205)
The slope tells how much y tends to be different when x changes by one unit; the intercept tells what we expect to get for y when x=0.
The method of drawing a line through a scattering of points tries to make a line that is as close as possible to the points in the VERTICAL direction. This line is called a "least squares line" or "the method of ordinary least squares."
C. Formula: the coefficients for the least squares line (using X to predict Y) are
slope = r * SD of y
-----------
SD of x
intercept = average of y's - (slope) * (average of x's)
* Note: you must solve for the slope first to calculate
intercept using these formulas.
3. Example
Using our data on cities again the average murder rate was 7.332 and the standard deviation was 3.984, the average poverty rate was 14.016% and the standard deviation was 4.287%. Their correlation is .63.
The regression equation for predicting murder rate from poverty rate is:
First calculate the slope
0.63 * 3.984
slope = ------------- = .5855
4.287
then the intercept
intercept = 7.332 - (.5855)*14.016 = -0.874
And put it all together
predicted murder rate = (.5855) x poverty rate + (-0.874)
Predict the murder rate of a city with a poverty rate of 12%.
predicted murder rate = .5855 x 12 + (-0.874) = 6.15
Predict the murder rate of a city with 0 poverty rate.
predicted murder rate = .5855 x 0 + (-0.874) = -0.874
Beware of extrapolation: predicting outside of the data
Predict the murder rate of a city with a poverty rate of 50; is this reasonable? (28.40, no, the sample of cities had a range of poverty from 8% to 26.4%)
(Hint: Check the scatter diagram or the data to make sure to make sure you stay within the range)
Beware of trying to predict the x variable. You can't do this, you would need a new regression line to predict the x variable from information about the y variable.