Wednesday, April 14, 2010.Homework due Friday is problem 2 in Chapter 8.
1. Logistic regression in R.2. Overfitting in logistic regression.3. Poisson regression.4. Estimation.1. Logistic regression in R.The book's website is http://www.stat.tamu.edu/~sheather/bookx1 = runif(4000)x2 = rnorm(4000)y = rbinom(4000, size=rep(1000,4000), prob = ilogit(1 + 4*x1 + 8*x2))x = glm(cbind(y,rep(1000,4000)-y) ~ x1 + x2, family=binomial)summary(x)y[1:5]ilogit(1 + 4*x1 + 8*x2)[1:5]y[1:5]/10002. Overfitting in logistic regression.library(faraway)
data(hormone)
plot(estrogen ~ androgen, data = hormone, pch=as.character(orientation))
abline(-84.5/90.2, 100.9/90.2)
mod1 = glm(orientation ~ estrogen + androgen, hormone, family = binomial)
summary(mod1)
It is easy to overfit when all the m's = 1.3. Poisson regression.P(Y = y) = exp(- mu) mu^y / y!, for y = 0, 1, 2, ....The mean is mu and the variance is mu.It comes up naturally:
a) similar to binomial, for small p and large n, with mu = np. 
b) Poisson processes and independence in disjoint intervals.
c) Interevent times are independent and exponential.glm(family = Poisson).The model is that Y are ind Poisson with mean mu, and where eta = X beta = g(mu). Usually people choose the link g to be the log function. eta = g(mu) = log(mu). mu = g^-1(eta), so g^-1 is the exponential function.The model can be written that Y are ind Poisson with mean mu = exp(eta) = exp(X beta).4. Estimation. The log likelihood is L(beta) = · yi xi^T beta - · exp(xi^T beta) - · log(yi!).Deviance = 2 · (yi log(yi / mu^_i) - (yi - mu^_i)).D should be chi^2 with p degrees of freedom.The difference in Ds between a given model and a saturated model should be chi^2 with n-p degrees of freedom. The difference between two nested models should be chi^2 with degrees of freedom equal to p1 - p2. Chi^2 = · (yi - mu^_i)^2 / mu^_i, and this should be chi^2 with n-p degrees of freedom.Null deviance just shows you how badly a null model with Yi are ind Poisson with mean just mu^ = exp(beta_0) fits. The residual deviance shows you how well, or really how poorly, a model with Yi are ind Poisson with mean mu = exp(beta0 + beta1 x1 + ... + betap-1 x_p-1) fits.