Fri, Apr 23, 2010.
1. Collect HW. No hw next week.
2. m-estimation, LAD regression, and Huber's method.
3. LTS.
4. R examples of kernel density estimation, kernel regression, b-splines, LAD, Huber, and LTS.

2. M-estimation, LAD regression, and Huber's method.Instead of minimizing the sum of squares residuals ·e_i^2, you could choose beta to minimize ·rho(e), where rho is some other function.   If you choose rho(x) = x^2, then you're back to least squares.   If rho(x) = |x|, then this is LAD (Least absolute deviation) regression, or L1 regression. With LAD, one big residual doesn't get squared and matter so much in determining beta.  With Huber's method, rho(x) = x^2/2 if |x| ² c and rho(x) = c|x| - c^2/2 if |x|>c, where c is some robust estimate of sigma, like the median of |e|. It's halfway between least squares and LAD.   These methods are in the category of m-estimation. Often the parameters can be implemented using WLS, where the weight for observation i is w(eps_i) = rho'(eps_i)/eps_i. For LS, w(eps_i) is constant. For Huber's method, w(eps_i) = 1 if |eps_i| ² c, and w(eps_i) = c/|eps_i| otherwise. V(beta^) = sigma^^2(X^TSX)^-1, using a robust estimate of sigma, where S=W is the estimate of the diagonal covariance matrix in WLS.3. Least trimmed squares, LTS.Sort the residuals in terms of their absolute value and minimize the sum of squares of the smallest q residuals in absolute value. It's resistant. A few outliers won't ruin your estimate of beta. Faraway has a default choice for q, q = floor(n/2) + floor((p+1)/2). No simple standard error formulas. Now beta^ cannot be calculated simply as (X^TX)^-1 X^TY. You have to try different betas. You can resample the epsilons from the data to get bootstrap samples to get estimates of the variance of beta.

4. R examples of kernel density estimation, kernel regression, b-splines, LAD, Huber, and LTS.
## a) kernel density estimation
x1 = sort(c(runif(1000)*30,rnorm(300)*2+10))
hist(x1,nclass=50,prob=T,main="kernel density estimates of x1")
lines(x1,dunif(x1,min=0,max=30)*1000/1300+dnorm(x1,mean=10,sd=2)*300/1300,
lty=1,col=1)
lines(density(x1,bw=.3),lty=1,col=2)
z1 = density(x1,bw=bw.nrd0(x1))
lines(z1,lty=1,col=3)
z2 = density(x1,bw=bw.nrd(x1))
lines(z2,lty=1,col=4)
lines(density(x1,bw=10),lty=1,col=5)
legend(20,.08,c("true density","h=0.3","bw.nrd0","bw.nrd","h=10"),
lty=rep(1,5),col=1:5)

## b) kernel regression
y = x1^3/2000 - x1^2/100 + x1/2 - sin(x1) + 1.2*rnorm(1300)
plot(x1,y,pch=".",main="kernel regression")
lines(x1,x1^3/2000 - x1^2/100 + x1/2 - sin(x1),col=2)
z2 = ksmooth(x1,y, bandwidth = bw.nrd(x1))
lines(z2,col=3)
lines(ksmooth(x1,y,bandwidth=30),col=4)
legend(10,20,c("true mean of Y given X","bw.nrd","h=30"),lty=c(1,1,1),col=2:4)

## c) b-splines
## install.packages(splines)
library(splines)
z3 = lm(y ~ bs(x1,knots=seq(min(x1),max(x1),4)))
plot(x1,y,pch=".",main="b-splines")
lines(x1,x1^3/2000 - x1^2/100 + x1/2 - sin(x1),col=3)
lines(x1,z3$fit,col=4)
z3b = lm(y ~ bs(x1,knots=seq(min(x1),max(x1),8)))
lines(x1,z3b$fit,col=2)
legend(10,20,c("true mean of Y given X","4 knots","8 knots"),
lty=c(1,1,1),col=c(3,4,2))

## d) LAD regression
library(MASS)
x2 = sort(c(runif(40),3))
y = 3*x2+rnorm(41)/3
y[41] = 3
plot(x2,y,main="LAD regression and Huber's method")
abline(lm(y~x2),lty=2)
z4 = rlm(y ~ x2,scale.est="MAD")
abline(z4)

## e) Huber's method
z5 = rlm(y ~ x2,scale.est="Huber")
abline(z5,col=4)
legend(2,.1,c("Least squares","LAD","Huber"),
lty=c(2,1,1),col=c(1,1,4))
z4$coef
z5$coef
plot(c(0.96,1),c(2.6,2.8),type="n",main="LAD regression and Huber's method zoomed in")
abline(z4)
abline(z5,col=4)
abline(lm(y~x2),lty=2)
legend(.97,2.75,c("LAD","Huber"), lty=c(1,1),col=c(1,4))

## f) LTS regression
x2[41] = 8
y[c(1,11,21,31)] = rep(1.4,4)
plot(x2,y,main="LTS regression")
abline(lm(y~x2),lty=2)
z6 = ltsreg(y ~ x2,.92)
abline(z6,col=4)
legend(4,1,c("Least Squares","LTS"),lty=c(2,1),col=c(1,4))