mjtsai1974's Dev Blog Welcome to mjt's AI world

Support Vector Machine Imperfect Separation And KKT Condition

Imperfect Separation And KKT Condition

How, if there exists some data points in the safeguard zone? When we are studying the field of SVM, it is the condition never be abandoned. This article will lead you break through the noise case and come out to the KKT condition.

Imperfect Separation With Noise

There exists some condition that we don’t strictly enforce that no data points in between $H_1$ and $H_2$. We can extend SVM to allow some noise(data points) in between the safeguard zone. Thus, we want to penalize the data points that cross the boundaries($H_1$,$H_2$).
[1]In this way, by introducting $\xi\geq0$, non-negative, such that:
➀$w^t\cdot x_i-b\geq1-\xi_i$, for $y_i=1$
➁$w^t\cdot x_i-b\leq-1+\xi_i$, for $y_i=-1$
This is to shrink down the distance between $H_1$ and $H_2$, thus to allow some noise within original margin.

[2]In addition to $\xi$, we also introduce the penalty $C$, where $C$ is finite in the objective function:
$\underset{w,\xi_i}{min}\left[w^t\cdot w+C\cdot(\sum_{i=1}^n\xi_i)^m\right]$, by usual $m=1$, which then leads to formulate our problem as:
$\underset{w,\xi_i}{min}\left[w^t\cdot w+C\cdot\sum_{i=1}^n\xi_i\right]$,

subject to

$y_i\cdot(w^t\cdot x_i-b)+\xi_i-1\geq0$, $\forall\xi_i\geq0$

[3]Next to build the lagrangian by introducing $\alpha_1$, $\alpha_2$,…, $\alpha_n$ and $\mu_1$, $\mu_2$,…, $\mu_n$, then:
\(\begin{array}{l}L(w,b,\xi,\alpha,\mu)\\=\frac12\cdot w^t\cdot w+C\cdot\sum_{i=1}^n\xi_i\\\;\;\;\;-\sum_{i=1}^n\alpha_i\cdot(y_i\cdot(w^t\cdot x_i-b)+\xi_i-1)\\\;\;\;\;-\sum_{i=1}^n\mu_i\cdot\xi_i\end{array}\)
; where the regularization terms are explained below:
➀the term $y_i\cdot(w^t\cdot x_i-b)+\xi_i-1$ is regularized by $\alpha_i$.
➁$\mu_i$ is for the regularization of the term $\xi_i$.

The lagrangian expression turns out to be:
\(\begin{array}{l}L(w,b,\xi,\alpha,\mu)\\=\frac12\cdot w^t\cdot w+\sum_{i=1}^n(C-\alpha_i-\mu_i)\xi_i\\\;\;\;\;-\sum_{i=1}^n(\alpha_i\cdot y_i\cdot x_i^t)\cdot w\\\;\;\;\;+\sum_{i=1}^n\alpha_i\cdot y_i\cdot b\\\;\;\;\;+\sum_{i=1}^n\alpha_i\end{array}\)

[4]To get the maximum of $L$ at $\alpha$ and $\xi$, below constraints must be satisfied for all $i$:
➀$\frac{\partial L}{\partial w}=0$, ➁$\frac{\partial L}{\partial b}=0$, ➂$\frac{\partial L}{\partial \xi}=0$, for all $i$:
➀we have $w$ in below expression:
\(\begin{array}{l}w^t-\sum_{i=1}^n(\alpha_i\cdot y_i\cdot x_i^t)=0\\\Leftrightarrow w=\sum_{i=1}^n(\alpha_i\cdot y_i\cdot x_i)\end{array}\)
➁implies $\sum_{i=1}^n\alpha_i\cdot y_i=0$
➂we have $\sum_{i=1}^nC-\alpha_i-\mu_i=0$, then for all $i$, $C-\alpha_i-\mu_i=0$ just holds.

[5]Notes that $\alpha_i\geq0$, $\mu_i\geq0$, therefore, we have $0\leq\alpha_i\leq C$,
$\alpha_i$ is now upper bounded by $C$.

[6]Substituting ➀, ➁, ➂ to $L(w,b,\xi,\alpha,\mu)$, then, we have:
\(\begin{array}{l}L(w,b,\xi,\alpha,\mu)\\=\frac12\cdot w^t\cdot w+\sum_{i=1}^n0\cdot\xi_i\\\;\;\;\;-w^t\cdot w+0\cdot b\\\;\;\;\;+\sum_{i=1}^n\alpha_i\\=\sum_{i=1}^n\alpha_i-\frac12\cdot w^t\cdot w\\=\sum_{i=1}^n\alpha_i-\frac12\cdot\sum_{i,j=1}^n\alpha_i\cdot\alpha_j\cdot y_i\cdot y_j\cdot x_i^t\cdot x_j\end{array}\)

[7]To maximize $L(w,b,\xi,\alpha,\mu)$ for the optimal $\alpha_i$ value, then, we formulate the lagrangian as:
\(\begin{array}{l}\underset\alpha{min}L(w,b,\xi,\alpha,\mu)\\=\sum_{i=1}^n\alpha_i-\frac12\cdot\sum_{i,j=1}^n\alpha_i\cdot\alpha_j\cdot y_i\cdot y_j\cdot x_i^t\cdot x_j\end{array}\)
, subject to:
➀$\sum_{i=1}^n\alpha_i\cdot y_i=0$
$0\leq\alpha_i\leq C$ for all $i$

Introduction To The KKT Condition

[1]The KKT optimality conditions of the formula in our problem are:
➀the gradient of $L(w,b,\xi,\alpha,\mu)$ with respect to $w$, $b$, $\xi$ vanish, $\frac{\partial L}{\partial w}=0$, ➁$\frac{\partial L}{\partial b}=0$, ➂$\frac{\partial L}{\partial \xi}=0$
➁$\alpha_i\cdot(y_i\cdot(w^t\cdot x_i-b)+\xi_i-1)=0$
➂$\mu_i\cdot\xi_i=0$
where ➁+➂ guarantees that $\xi$ will have the smallest impact on $(w^t\cdot x_i-b)-1$.

[2]By KKT conditions, there exists 3 cases to be evaluated by below constraints:
➀$\frac{\partial L}{\partial \xi}=C-\alpha_i-\mu_i=0$
➁$\alpha_i\cdot(y_i\cdot(w^t\cdot x_i-b)+\xi_i-1)=0$
from which, this paragraph would guide you through the deduction process.

[3]Remember that we have $0\leq\alpha_i\leq C$ for all $i$.
[case 1]$\alpha_i=0$, then, we are at the boundary:
$\mu_i=C-\alpha_i=C>0$, by $\mu_i\cdot\xi_i=0$, we have $\xi_i=0$, thus,

$y_i\cdot(w^t\cdot x_i-b)-1\geq0$

just holds.

[case 2]$0<\alpha_i<C$, the case of non-boundary.
then, $\mu_i=C-\alpha_i>0$, by $\mu_i\cdot\xi_i=0$, again, we have $\xi_i=0$, then,
$y_i\cdot(w^t\cdot x_i-b)+\xi_i-1=0$,
therefore, $y_i\cdot(w^t\cdot x_i-b)-1=0$.

[case 3]$\alpha_i=C$, also, the boundary case. then, $\mu_i=C-\alpha_i=0$, by $\mu_i\cdot\xi_i=0$, we have $\xi_i\geq0$, then,
$y_i\cdot(w^t\cdot x_i-b)+\xi_i-1=0$,
$y_i\cdot(w^t\cdot x_i-b)-1\leq0$ just holds, more precisely, it should be:
$y_i\cdot(w^t\cdot x_i-b)-1\leq-\xi_i$…by mjtsai

[4]Continue the deduction on the condition term to be regularized by $\alpha_i$ in the most original objective function.
$y_i\cdot(w^t\cdot x_i-b)-1$
$=y_i\cdot(w^t\cdot x_i-b)-y_i^2$
$=y_i\cdot((w^t\cdot x_i-b)-y_i)$
$=y_i\cdot E_i$…$E_i=u_i-y_i$
$=R_i$…$u_i=w^t\cdot x_i-b$

Let’s summarize the KKT conditions:
➀$\alpha_i=0$, $R_i\geq0$
➁$0<\alpha_i<C$, $R_i\approx0$
➂$\alpha_i=C$, $R_i\leq0$

[5]Below lists the KKT violating cases:
➀$\alpha_i<C$ and $R_i<0$
➁$\alpha_i>0$ and $R_i>0$
➂$\alpha_i=0$ and $R_i>0$…by mjtsai

Checking The KKT Without Using Threshold $b$

Due to the fact that $L(w,b,\xi,\alpha,\mu)$ doesn’t solve for $b$ directly, as a result, it would be beneficial to check KKT without using threshold $b$.
$R_i$…begin from here
$=y_i\cdot E_i$
$=y_i\cdot(w^t\cdot x_i-b-y_i)$
$=y_i\cdot(u_i-y_i)$
$=y_i\cdot(w^t\cdot x_i-y_i-b)$
$=y_i\cdot(F_i-b)$…$F_i=w^t\cdot x_i-y_i$
therefore, we have $E_i=F_i-b$, then to evaluated by
$E_i-E_j=F_i-b-(F_j-b)=F_i-F_j$