The Southern California Chapter of the American Statistical Association presents

The 20th Annual Workshop in Applied Statistics

Detailed Information available here.


Regression Graphics

Prof. Dennis Cook
School of Statistics
University of Minnesota


In simple regression a 2D plot of the response versus the predictor displays all of the sample information, and can be quite helpful for gaining insights about the data and for guiding the choice of a first model.  Such straightforward graphical displays of all the data are not generally possible with many predictors, but informative displays are still possible in situations where we can find low-dimensional views, the only ones that are possible in practice, that provide "sufficient" information about the regression.  In regression graphics we seek to facilitate visualization of the data by reducing the dimension of the predictor vector without loss of  information on the regression, defined broadly as the study of the conditional distribution of the response given the predictors, and without requiring a pre-specified parametric model. We called this sufficient dimension reduction, borrowing terminology from classical statistics.  Sufficient dimension reduction leads naturally to the idea of a sufficient summary plot that contains all of the information on the regression that is available from the sample.

Sufficient summary plots can be valuable for guiding the choice of a first model, for diagnosing problems in postulated models, and for gaining fresh insights about the regression, as 2D plots are in simple regression.  Seemingly complicated regressions can often be summarized adequately in a relatively simple summary plot. Sufficient summary plots allow visual solutions to many long-standing problems in regression.  For example, they can be used to identify outliers and regression mixtures without the need to pre-specify a parametric model.

We will start with a little history on regression graphics, and a discussion of the various roles for graphics in statistics.  We will next discuss the population foundations for sufficient summary plots, using examples to motivate the approach and illustrate its likely advantages in practice. We will then turn to existing methodology for finding summary plots in practice and finally to current research in the area.

All of the new and existing methods to be discussed have been implemented in Arc (Cook and Weisberg 1999; www.stat.umn.edu/arc/).  Illustrations, including regressions with a binary response, will be incorporated throughout the program.


Friday, May 11, 2001
8:30am to 4pm
California State University, Long Beach

Schedule, Map, Registration form, and additional information available here.