Monday, Week 6,  May 8, 2000

Discuss the first chapter from Advances in Kernel Methods-Support vector learning,
   edited by Berneard Scholkopt, Christopher J.C. Burgers, and Alexder J. Smola

1. Motivation is different from Bayesian classification . In Bayesian setting, we begin with conditional
    distribution $f(x|y)$ and use Bayes theorem to find optimal prediction rules. The key is to estimate
     the distribution of the predictor $x$ for each  group .   The density functions for different groups  typically overlap with
     each other.

   In  the context of Support vector machine, separability is the key assumption. This means that when there are two groups,
     denoted by $Y=1, Y=-1$,
     the region for which $f(x|Y=1)>0$ is separable from the region for which $f(x|Y=-1)>0$
 

2. Describe the optimal separation hyperplane.  Derive the solution by Karush-Kuhn-Tucker theorem.
     Explain  how  the dual optimization problem is equivalent to finding shortest distance between two convex hulls, one generated by
      the points from each group.