Monday, April 12, 2010.1. Residuals for binary data.
2. Leverage.
3. Link functions.
4. Interpreting logistic regression.

1. Residuals for binary data.
m_i = 1 for i = 1,2,...,n.
y_i = 0 or 1. 
For a saturated model, the estimate of theta_i will simply be 0 or 1. 
That is, theta_i = y_i.
The deviance is not a good measure of fit for such cases and its distribution is not close to chi-square. 
However, the difference between two deviances will often be approximately chi-square.
Standardized residual plots are not useful for binary data. See p281.

Ignore 8.2.3.

Note that (M1) on p286 says theta(x) = 1 / [1+exp(- X beta)]. 
This is the same as the logistic model theta(x) = exp(X beta) / [1 + exp(X beta)].
Multiply the numerator and denominator by exp(- X beta) to get M1.
Marginal model plot. p288.
For any x, you have a fitted value from the model, y^, and an observed value, y.
The observed value will be 0 or 1. y^ = theta^(x) = exp(X beta^) / [1 + exp(X beta^)].
You can smooth the relationship between y and x, and the relationship between y^ and x, 
and see if they are similar. 
If the model fits poorly, then the two smoothed relationships will be different.

If y = 0 or 1, it is very easy to overfit, and to have the usual problems of high variance in beta^, artificially low SEs and bad predictions. 

2. Leverage.

The average leverage is (p+1)/n, where p+1 parameters are in the logistic regression model.
Points with more than twice (p+1)/n are considered leverage points.
High leverages can result in underestimation of the corresponding standard errors. 
Trust differences in deviance over individual Wald statistics in these cases. See p292.

3. Link functions. 
Let eta = X beta be your linear predictor. 
Let p=theta = some function of eta, where theta has to be between 0 and 1 no matter what eta is. 
By convention, we actually think of eta = g(p) 
and g is the link function, but you could also write p = g^-1(eta).Logit: eta = g(p) = log(p/(1-p)). p = g^-1(eta) = exp(eta) / [1+exp(eta)]. Logistic regression. 
Probit: eta = g(p) = Phi^-1(p). p = g^-1(eta) = Phi(eta), where Phi is the standard normal cdf.
Complementary log-log, eta = g(p) = log(-log(1-p)). p = g^-1(eta) = 1 - exp(-exp(eta)).The model is that Yi are ind binomial(ni, pi) = binomial(ni, g^-1(eta)) = binomial(ni, g^-1(X beta)).The logistic regression model is that Yi are ind binomial(ni, exp(X beta) / (1+exp(X beta))).4. Interpreting logistic regression. 
The odds of p are p/q, where q = 1-p. So if p = 1/5, then o = 1/5 / 4/5 = 1/4. In general, p = o/(1+o).The odds against p are 1/o. So if p = 20%, then the odds against it are 4 to 1.In logistic regression, p^ = exp(X beta^) / (1+exp(X beta^)), so p^/q^ = p^/(1-p^) = exp(X beta^) / (1+exp(X beta^)) / [1 / (1+exp(X beta^))] = exp(X beta^). So, log(p^/q^) = X beta^. So, log(odds) = X beta^. So a unit increase in x1 means a predicted increase of beta1^ in the log odds. That is, the predicted odds increase by a FACTOR of exp(beta^).Note that for small p, p/q and p are similar, so for small p, you can interpret odds and probabilities similarly. If p = 1/1000, then o = 1/999.Different link functions will fit the data similarly but will have different curves away from the data. 
You have to watch out for extrapolation.