Wed, April 21, 2010.
1. More about kernel regression.
2. Choosing h.
3. B-spline regression.
4. m-estimation, LAD regression, and Huber's method.
5. LTS.

1. More about kernel regression.In R, you can do ksmooth, or in splancs, see kernel2d and kernel3d.95% confidence intervals for kernel regression estimates f^(x_i) can be constructed using the formula f^(x_i) +/- 1.96 sqrt{sigma^2(x_j) ||K||_2^2/ (Y_i n h)},where sigma^2(x) = · (i = 1 to n) W_hi(x) (Y_i - m^_h(x))^2 / n,W_hi(x) = K((x-xi)/h) / (hg^(x)), g^(x) = · (i = 1 to n) K(x-x_i)/n,m^_h(x) is the Nadaraya-Watson estimate, i.e. 
m^_h(x) = (· (i = 1 to n) K_h(x - x_i)Y_i) / (· i = 1 to n K_h(x-xi))K_h(x) = K(x/h)/h,and ||K||_2^2 = º -° to ° K^2(u) du.This formula comes from Hardle (1991).

2. Choosing h.
a) Minimizing mean integrated squared error. 
p373.
b) Plug-in methods.
Start with an initial estimate of f, e.g. by assuming f is normal, and use that initial guess to estimate R(f'') and R(f'''). Using those estimates, find the optimal h that minimizes mean integrated squared error.
p374.
c) Silverman's rule of thumb. Silverman (1986).
bw.nrd0(x) = .9 * min(sd, IQR/1.34) n^(-1/5).
d) Scott (1992)'s rule of thumb.
bw.nrd(x) = 1.06 * min(sd, IQR/1.34) n^(-1/5).
e) Cross-validation, removing one i at a time and predicting Y_i, or leave out a segment of x's if you have overlapping x's.

3. B-spline regression. 
The fitted curve is the sum of basis functions of X, where each basis is a degree 3 polynomial in X between 2 knots, each is continuous, each is continuous in its first 2 derivatives at the knots, and each integrates to 1.	Nonparametric estimates always fit well. They sometimes overfit, especially when you don't have much data.

4. M-estimation, LAD regression, and Huber's method.Instead of minimizing the sum of squares residuals ·e_i^2, you could choose beta to minimize ·rho(e), where rho is some other function.   If you choose rho(x) = x^2, then you're back to least squares.   If rho(x) = |x|, then this is LAD (Least absolute deviation) regression, or L1 regression. With LAD, one big residual doesn't get squared and matter so much in determining beta.