Wednesday, April 7, 2010. 1. Bernoulli. 2. Binomial variables and binomial regression. 3. Logistic function. 4. Odds. 1. Bernoulli random variables. Z ~ Bernoulli(p). Z is 1 with probability p, and 0 with probability q = 1-p. If Z is Bernoulli(p), then E(Z) = p, and V(Z) = pq. 2. Binomial random variables and binomial regression. Z ~ Bin(m, p) Z is the sum of m iid Bernoulli random variables. iid means independent and identically distributed. e.g. the total number of males in 10 babies, the number of O-ring dislodgings on space shuttles, the number of people in a simple random sample who vote for a particular candidate or who like a particular restaurant. P(Z = j) = choose(m,j) p^j q^(m-j). Why? One possible way that Y could = j is if you have 1,1,...,1,0,0,...,0. The probability of that, in that order, by independence, is p*p*...*p*q*q*...*q = p^j q^(m-j). It could also be 1,0,1,1,0,1,0,0,...,1,0, where j are 1's and m-j are 0's, and the probability of that would be p^j q^(m-j) too. How many possibilities are there? choose(m,j) = m! / [j! (m-j)!], because this is the number of ways to choose j 1's out of m trials. If Z is binomial(m,p), then E(Z) = mp, and V(Z) = mpq. In Binomial regression, the response can be Z_i = the number of "successes" out of m_i trials, or more commonly, Z_i/m_i, given some covariates X_i. Restaurants example, p265. Z_i/m_i = the PROPORTION of French restaurants in New York that were listed in the Michelin guide. 3. Logistic function. It's tempting to model E(Y) = X beta, i.e. Y_i = binomial(m_i, X beta), i.e. beta_0 + beta_1 X_1 + ... + beta_p X_p + eps_i, but this has problems in the binomial context. If Y = the number of successes or the fraction of successes, then Y MUST be non-negative, for any choices of X. But for any choices of beta's, there will always be some possible X's where the prediction of Y would be negative. Instead, people often model Y_i = binomial(m_i, logistic(X beta)). The logistic function is given on the bottom p265. No matter what X beta is, logistic(X beta) will always be positive. 4. Odds. Odds of, or odds in favor of, an event = p/(1-p). Odds against are (1-p)/p. See p266.