Discuss the first chapter from Advances in Kernel Methods-Support vector
learning,
edited by Berneard Scholkopt, Christopher J.C. Burgers,
and Alexder J. Smola
1. Motivation is different from Bayesian classification . In Bayesian
setting, we begin with conditional
distribution $f(x|y)$ and use Bayes theorem to find
optimal prediction rules. The key is to estimate
the distribution of the predictor $x$ for
each group . The density functions for different groups
typically overlap with
each other.
In the context of Support vector machine, separability
is the key assumption. This means that when there are two groups,
denoted by $Y=1, Y=-1$,
the region for which $f(x|Y=1)>0$ is separable
from the region for which $f(x|Y=-1)>0$
2. Describe the optimal separation hyperplane. Derive the solution
by Karush-Kuhn-Tucker theorem.
Explain
how the dual optimization problem is equivalent to finding shortest
distance between two convex hulls, one generated by
the
points from each group.