Wednesday, April 14, 2010. Homework due Friday is problem 2 in Chapter 8. 1. Logistic regression in R. 2. Overfitting in logistic regression. 3. Poisson regression. 4. Estimation. 1. Logistic regression in R. The book's website is http://www.stat.tamu.edu/~sheather/book x1 = runif(4000) x2 = rnorm(4000) y = rbinom(4000, size=rep(1000,4000), prob = ilogit(1 + 4*x1 + 8*x2)) x = glm(cbind(y,rep(1000,4000)-y) ~ x1 + x2, family=binomial) summary(x) y[1:5] ilogit(1 + 4*x1 + 8*x2)[1:5] y[1:5]/1000 2. Overfitting in logistic regression. library(faraway) data(hormone) plot(estrogen ~ androgen, data = hormone, pch=as.character(orientation)) abline(-84.5/90.2, 100.9/90.2) mod1 = glm(orientation ~ estrogen + androgen, hormone, family = binomial) summary(mod1) It is easy to overfit when all the m's = 1. 3. Poisson regression. P(Y = y) = exp(- mu) mu^y / y!, for y = 0, 1, 2, .... The mean is mu and the variance is mu. It comes up naturally: a) similar to binomial, for small p and large n, with mu = np. b) Poisson processes and independence in disjoint intervals. c) Interevent times are independent and exponential. glm(family = Poisson). The model is that Y are ind Poisson with mean mu, and where eta = X beta = g(mu). Usually people choose the link g to be the log function. eta = g(mu) = log(mu). mu = g^-1(eta), so g^-1 is the exponential function. The model can be written that Y are ind Poisson with mean mu = exp(eta) = exp(X beta). 4. Estimation. The log likelihood is L(beta) = · yi xi^T beta - · exp(xi^T beta) - · log(yi!). Deviance = 2 · (yi log(yi / mu^_i) - (yi - mu^_i)). D should be chi^2 with p degrees of freedom. The difference in Ds between a given model and a saturated model should be chi^2 with n-p degrees of freedom. The difference between two nested models should be chi^2 with degrees of freedom equal to p1 - p2. Chi^2 = · (yi - mu^_i)^2 / mu^_i, and this should be chi^2 with n-p degrees of freedom. Null deviance just shows you how badly a null model with Yi are ind Poisson with mean just mu^ = exp(beta_0) fits. The residual deviance shows you how well, or really how poorly, a model with Yi are ind Poisson with mean mu = exp(beta0 + beta1 x1 + ... + betap-1 x_p-1) fits.