ddhints.html

DataDesk Hints Computing a Regression
You might want to read ActivStats lesson on this. It is fairly straightforward. Much of the information that falls out of the regression calculation will be explained in 110B.

Choose one variable as an X and one as a Y. Your first step should be to make a scatterplot and check that there appears to be a linear relationship between the two. (What to do if this is not the case is discussed in 110B.) You then have two choices: You can either select "Regression of X vs. Y" from the menu that appears if you click on the arrow button on the upper left corner of the scatterplot window. Or you can choose Calc > Regression from the menu bar. Either one will open a window with the results of the regression. This quarter we will be concerned with only a few of these numbers. The two most important are labelled "Constant" and "X" (well, actually instead of "X" it puts the name of the x variable.) These numbers are the intercept and slope, respectively, of the regression line. Near the top of the window is the R-squared value, which is useful for interpreting the "success" of the regression as a summary of the linear relationship.

If you click on the arrow in the upper left corner of the regression window, you will have the option to make a scatterplot of the residuals vs. the predicted values. This is useful for checking some of the underlying assumptions of the regression model. For this quarter, the most important use of this is to make sure that the residuals show now "patterns" that might indicate that something non-linear is going on.

Comparing two groups in regression

Often, we'll study the relationship between two continuous variables, say height and weight, but at the same time we're comparing two groups. For example, we might try to answer questions like : What is the relationship between height and weight? Is the relationship different for men than for women? You can try these directions out on the classdata.

DataDesk has several options for this. Before reading, you might want to watch ActivStats excellent tutorial in Chapter 7-2 called "Case Study: Car Fuel Efficiencies". (In that study, four different groups are compared.)

Suppose we have observations on three variables. The first two, height and weight, are continuous, and the third, sex, is categorical. (Sex might consist of M's and F's, or 1's and 0's, or any other codes used to designed "male" or "female".)

If you want to know if knowing someone's height helps you predict their weight, you would make a scatterplot using height as the "x" variable and weight as the "y". Once you've done that using DataDesk, you can click on the "arrow" in the upper left corner of the scatterplot window. A menu will appear. One of the options is "add regression line." If you choose this, a regression line will be superimposed over the scatterplot.

This scatterplot shows both men and women together, and there is no way to tell which points refer to men, and which to women. So double-click on the "sex" variable icon to get it to "open." Now, select Modify > Tools from the menu bar. A small "pad" of "tools" will appear. Select the "query" tool, which looks like a large question mark (?). If you then click the question mark over a point, it will display the value of the sex variable for that point. By clicking on several different points, you can build an idea of where the Men are and where the Women are on this plot.

But this is only a very crude method. Slightly more sophisticated is the following: Select Modify > Colors > Add > By Group from the menu bar. The scatterplot will now change colors, so that the men are one color, the women another. You can now tell, at a glance, the relationship of these two groups to each other. Now, click on the menu-arrow on the upper left corner of the scatterplot window. Select "Add color regression lines." Does it look like the relationship between height and weight is different for men than for women? Very different?

There is another method.

Select both Height and Weight as the "Y" variable. (Hold down the option key and click once on the variables' icons.) Select Sex as the "X" variable (hold down the shift key and click once on the icon.) Now, from the menubar select Manip > Split into Variables by Group.

A new window will open up that contains an "m" icon and an "f" icon. The "m" icon, if you click twice, will open up to reveal that it contains two variables. m:height contains the men's heights. m:weight contains the men's weights. The "f" icon contains the women's heights and weights. Now, you can choose m:height as the "X" and m:weight as the "Y", and make one scatterplot. Then you can do another scattterplot for the women.