index.html

Monday, Week 6, May 8, 2000

Discuss the first chapter from Advances in Kernel Methods-Support vector learning,
edited by Berneard Scholkopt, Christopher J.C. Burgers, and Alexder J. Smola

1. Motivation is different from Bayesian classification . In Bayesian setting, we begin with conditional
    distribution $f(x|y)$ and use Bayes theorem to find optimal prediction rules. The key is to estimate
     the distribution of the predictor $x$ for each group .   The density functions for different groups typically overlap with
     each other.

   In the context of Support vector machine, separability is the key assumption. This means that when there are two groups,
     denoted by $Y=1, Y=-1$,
     the region for which $f(x|Y=1)>0$ is separable from the region for which $f(x|Y=-1)>0$

2. Describe the optimal separation hyperplane. Derive the solution by Karush-Kuhn-Tucker theorem.
Explain how the dual optimization problem is equivalent to finding shortest distance between two convex hulls, one generated by
the points from each group.