Fri, April 16, 2010. 1. Hand in HW2, problem 8.2. 2. Get handout. 3. HW3. 4. Poisson regression. 5. Estimation for Poisson regression. 6. Simulated examples. 3. HW3 is similar to problem 5 from chapter 3 of "Extending the Linear Model in R" by Faraway. Here is the problem. The dvisits data in the faraway package comes from the Australian Health Survey of 1977-78 and consists of 5190 single adults where young and old have been oversampled. Do library(faraway) and data(dvisits) to get the data in R. install.packages(faraway) (a) Build a Poisson regression model with doctorco as the response and sex, age, agesq, income, levyplus, freepoor, freerepa, illness, actdays, hscore, chcond1, and chcond2 as possible predictor variables. Considering the deviance of this model, does this model fit the data? (b) Plot the residuals and the fitted values. Why are there lines of observations on the plot? (c) Use backward elimination with a critical p-value of 5% to reduce the model as much as possible. Report your model. (d) What sort of person would be predicted to visit the doctor the most under your selected model? (e) For the last person in the dataset, compute the predicted probability that they would visit the doctor 0, 1, or 2 times. (f) Fit a comparable Gaussian linear model using ordinary linear regression and graphically compare the fits. Describe how they differ. 4. Poisson regression.P(Y = y) = exp(- mu) mu^y / y!, for y = 0, 1, 2, ....The mean is mu and the variance is mu.The Poisson regression model is that Y are ind Poisson with mean mu, where mu = exp(X beta). X beta = beta0 + beta1 X1 + beta2 X2 + ... + beta_p X_p Note that this means that, if X beta is large, then mu is large, so the variance of Y is large, and if X beta is small, then the variance of Y is small. In linear regression, the variance of Y_i is constant. In logistic regression, Y are ind. binomials with mean mp, where p = exp(X beta) / (1+exp(X beta)), so Var(Y_i) = mp(1-p), and this again is not constant. It is highest when p is closest to 0.5. But in Poisson regression, the variance of Y is really changing a lot with X beta. Note that in Poisson and logistic regression, when the mean is small, often most of the Ys will be 0 or 1. In those cases, standard residual plots might not be too useful because you will see a line corresponding to when Y=0 and another line corresponding to Y=1. See ch8 p278 for instance, or ch8 p281. 5. Estimation and deviance. The log likelihood is L(beta) = · yi xi^T beta - · exp(xi^T beta) - · log(yi!).Deviance = 2 · (yi log(yi / mu^_i) - (yi - mu^_i)) = G-statistic. The deviance, D, is interpreted just like in logistic regression. It should be chi^2 with p+1 degrees of freedom. The difference in Ds between a given model and a saturated model should be chi^2 with n-(p+1) degrees of freedom. The difference between two nested models should be chi^2 with degrees of freedom equal to p1 - p2. Chi^2 = · (yi - mu^_i)^2 / mu^_i, and this should be chi^2 with n-(p+1) degrees of freedom.Null deviance just shows you how badly a null model with Yi are ind Poisson with mean just mu^ = exp(beta_0) fits. The residual deviance shows you how well a model with Yi are ind Poisson with mean mu = exp(beta0 + beta1 x1 + ... + betap x_p) fits. For both Poisson and logistic regression, you can do forward or backward elimination to choose your explanatory variables, as in linear regression. You might use the differences in deviances as your criteria for including or not including variables. 6. Simulated examples. n = 100 x1 = runif(n) x2 = runif(n) y = rpois(n, lambda = exp(1 + 0.6*x1 + 0.8*x2)) x = glm(y ~ x1 + x2, family=poisson) summary(x) plot(x$fitted, x$resid)