mjtsai1974's Dev Blog Welcome to mjt's AI world

Hoeffding Bounds

Prologue To The Hoeffding Bounds

The Hoeffding lemma and Hoeffding inequality are pervasively found in tremendous papers and lectures in machine learning, statistics, probability theory, such inequality relates the random variables belonging to the same distribution, to facilitate stepping toward the minimum bias with the suggested sampling size.

The Mjtsai1974 Light Weight Upper Bound

Before stepping into the major topic, make ourself a tiny break in a corner with a rather simple inequality, constrain still the same random variable as Hoeffding lemma. It’s just the trivial finding during my proof in these Hoeffding bounds. I nicknamed it the Mjtsai1974 Light Weight Upper Bound.

Given $X\in\lbrack a,b\rbrack$, it’s a random variable with $E\lbrack x\rbrack$=$0$, then for any $s>0$, we have $E\lbrack e^{s\cdot X}\rbrack\le e^{s\cdot(a-b)}$

proof::mjtsai1974

➀by given $X\in\lbrack a,b\rbrack$, take $\lambda$=$\frac {x-a}{b-a}$,
we just have $x$=$(1-\lambda)\cdot a$+$\lambda\cdot b$.
➁since $g(E\lbrack s\cdot X\rbrack)\le E\lbrack g(s\cdot X)\rbrack$ by Jensen's inequality for any convex function $g(X)$=$e^{X}$, then
$e^{(1-\lambda)\cdot s\cdot a+\lambda\cdot s\cdot b}\le (1-\lambda)\cdot e^{s\cdot a}+\lambda\cdot e^{s\cdot b}$
$\Rightarrow E\lbrack e^{(1-\lambda)\cdot s\cdot a+\lambda\cdot s\cdot b}\rbrack\le E\lbrack (1-\lambda)\cdot e^{s\cdot a}+\lambda\cdot e^{s\cdot b}\rbrack$
➂that is to say, for any $x\in X$, we have
$E\lbrack e^{s\cdot x}\rbrack$
$\le E\lbrack (1-\lambda)\cdot e^{s\cdot a}+\lambda\cdot e^{s\cdot b}\rbrack$
=$E\lbrack (\lambda+(1-\lambda)\cdot e^{s\cdot a-s\cdot b})\cdot e^{s\cdot b}\rbrack$
=$E\lbrack (\frac {x-a}{b-a}+(\frac {b-x}{b-a})\cdot e^{s\cdot a-s\cdot b})\cdot e^{s\cdot b}\rbrack$
=$(\frac {-a}{b-a}+(\frac {b}{b-a})\cdot e^{s\cdot (a-b)})\cdot e^{s\cdot b}$…$E\lbrack x\rbrack$=$0$
, which is equivalent to ask for its approximation.
➃take $p$=$\frac {-a}{b-a}$, then $1-p$=$\frac {b}{b-a}$. From the two equalities, we can deduce out two expressions for $b$:
$b$=$\frac {a\cdot (p-1)}{p}$ and $b$=$(1-p)\cdot (b-a)$
Choose $b$=$(1-p)\cdot (b-a)$ to express $b$, then above inequality becomes
$E\lbrack e^{s\cdot x}\rbrack$
$\le (\frac {-a}{b-a}+(\frac {b}{b-a})\cdot e^{s\cdot (a-b)})\cdot e^{s\cdot b}$
=$((1-p)\cdot e^{s\cdot (a-b)}+p)\cdot e^{s\cdot (1-p)\cdot (b-a)}$
=$((1-p)\cdot e^{s\cdot (a-b)}+p)\cdot e^{s\cdot (p-1)\cdot (a-b)}$
The reason we choose such $b$ expression is to simplify the inequality.
➄take $u$=$a-b$,$\theta(u)$=$((1-p)\cdot e^{s\cdot u}+p)\cdot e^{s\cdot (p-1)\cdot u}$, then
$ln\theta(u)$
=$ln((1-p)\cdot e^{s\cdot u}+p)$+$s\cdot (p-1)\cdot u$
$\le ln((1-p)\cdot e^{s\cdot u}+p)$…since $p-1\le 0$
$\le ln((1-p)\cdot e^{s\cdot u})$…since $p\le 1$
$\le ln(e^{s\cdot u})$
=$s\cdot u$
, then $\theta(u)\le e^{s\cdot u}$
Therefore, $E\lbrack e^{s\cdot X}\rbrack\le e^{s\cdot(a-b)}$ is thus proved, this is a simple, light weight inequality, I denote it Mjtsai1974 Light Weight Upper Bound.

The Hoeffding Lemma

Given $X\in\lbrack a,b\rbrack$, it’s a random variable with $E\lbrack x\rbrack$=$0$, then for any $s>0$, we have $E\lbrack e^{s\cdot X}\rbrack\le e^{\frac {s^2\cdot(b-a)^2}{8}}$

proof::mjtsai1974

We would inherit from ➄ in the proof of Mjtsai1974 Light Weight Upper Bound, begin from $ln\theta(u)$=$ln((1-p)\cdot e^{s\cdot u}+p)$+$s\cdot (p-1)\cdot u$, where $u$=$a-b$,$\theta(u)$=$((1-p)\cdot e^{s\cdot u}+p)\cdot e^{s\cdot (p-1)\cdot u}$.
➀take $M(u)$=$ln\theta(u)$, by Taylor theorem, for $w\in\lbrack u,0\rbrack$, where in this case $u\le 0$, we have
$\lim_{w\rightarrow u}M(w)$
$\approx ln\theta(u)$+$\frac {ln^{′}\theta(u)}{1!}\cdot (w-u)$+$\frac {ln^{″}\theta(u)}{2!}\cdot (w-u)^2$+$\frac {ln^{′″}\theta(u)}{3!}\cdot (w-u)^3$+…
$\le ln\theta(0)$+$\frac {ln^{′}\theta(0)}{1!}\cdot (0-u)$+$\frac {ln^{″}\theta(0)}{2!}\cdot (0-u)^2$+$\frac {ln^{′″}\theta(0)}{3!}\cdot (0-u)^3$+…
➁from above, we must make first, second derivatives, they are
$ln^{′}\theta(u)$=$(p+(1-p)\cdot e^{s\cdot u})^{-1}\cdot ((1-p)\cdot s\cdot e^{s\cdot u})$+$s\cdot(p-1)$
$ln^{″}\theta(u)$=$-1\cdot (p+(1-p)\cdot e^{s\cdot u})^{-2}\cdot ((1-p)\cdot s\cdot e^{s\cdot u})^{2}$
$\;\;\;\;\;\;\;\;+(p+(1-p)\cdot e^{s\cdot u})^{-1}\cdot ((1-p)\cdot s^2\cdot e^{s\cdot u})$
And, $ln\theta(0)$=$0$, $ln^{′}\theta(0)$=$0$,
$ln^{″}\theta(0)$=$-1\cdot((1-p)\cdot s)^2$+$(1-p)\cdot s^2$=$p\cdot (1-p)\cdot s^2$
➂therefore, we’d like to maximum the term $ln^{″}\theta(u)$,
take $z$=$e^{s\cdot u}$, $s$=$\frac {ln(z)}{u}$, then
$ln^{″}\theta(u)$…further regularized
=$\frac {p\cdot (1-p)\cdot s^2\cdot e^{s\cdot u}}{(p+(1-p)\cdot e^{s\cdot u})^{2}}$
=$\frac {p\cdot (1-p)\cdot (\frac {ln(z)}{u})^2\cdot z}{(p+(1-p)\cdot z)^{2}}$
$\le \frac {p\cdot (1-p)\cdot z}{(p+(1-p)\cdot z)^{2}}$…$(\frac {ln(z)}{u})^2\le 1$
➃take $N(z)$=$\frac {p\cdot (1-p)\cdot z}{(p+(1-p)\cdot z)^{2}}$, and $ln^{″}\theta(u)$=$s^2\cdot N(z)$, this is equivalent to ask for the maximum of $N(z)$, the formal procedure would be to take its first derivative and set it to $0$, next to get the possible $z$ expressed in terms of $p$, in this case, $z$ is the input parameter in $N(z)$.
$N′(z)$=$\frac {p^{2}\cdot (1-p)-p\cdot (1-p)^{2}\cdot z}{(p+(1-p)\cdot z)^{3}}$=$0$
After we solve it, we get $z$=$\frac {p}{1-p}$ is the critical point.
➄take $z$=$\frac {p}{1-p}$ in $N(z)$
$N(z)$=$\frac {p\cdot (1-p)\cdot\frac {p}{1-p}}{(p+(1-p)\cdot\frac {p}{1-p})^{2}}$=$\frac {p^{2}}{4\cdot p^{2}}$=$\frac {1}{4}$
Be recalled that $ln^{″}\theta(u)$=$s^2\cdot N(z)$, back to the Taylor expansion in ➁, we have
$ln\theta(u)$
=$M(u)$
$\le \frac {ln^{″}\theta(0)}{2!}\cdot(u)^{2}$
=$\frac {s^{2}\cdot N(z)}{2!}\cdot (u)^2$
$\le \frac {1}{2!}\cdot s^{2}\cdot\frac {1}{4}\cdot u^{2}$
=$\frac {1}{8}\cdot s^{2}\cdot u^{2}$
Therefore, $\theta(u)\le e^{\frac {1}{8}\cdot s^{2}\cdot u^{2}}$
➅finally, we just proved that
$E\lbrack e^{s\cdot X}\rbrack\le \theta(u)\le e^{\frac {1}{8}\cdot s^{2}\cdot u^{2}}$=$e^{\frac {s^{2}\cdot (a-b)^{2}}{8}}$=$e^{\frac {s^{2}\cdot (b-a)^{2}}{8}}$

The Hoeffding Inequality

Given that $X_1$,$X_2$,,,$X_n$ are independent random variables with $a_i\le X_i\le b_i$, then
$\;\;\;\;P(\frac {1}{n}\cdot\sum_{1}^{n}(X_i-E\lbrack X_i\rbrack)\ge\varepsilon)\le exp(\frac {-2\cdot n^2\cdot\varepsilon^2}{\sum_{1}^{n}(b_i-a_i)^2})$

Proof:
We’d like to prove it by Hoeffding lemma.
➀take $Z_i$=$X_i-E\lbrack X_i\rbrack$, for all $i$, then $Z_i\in\lbrack a_i,b_i\rbrack$ and $E\lbrack Z_i\rbrack$=$0$ just holds.
$P(\frac {1}{n}\cdot\sum_{i=1}^{n}(X_i-E\lbrack X_i\rbrack))$=$P(\frac {1}{n}\cdot\sum_{i=1}^{n}(Z_i))$, then
$P(\frac {1}{n}\cdot\sum_{i=1}^{n}(X_i-E\lbrack X_i\rbrack)\ge\varepsilon)$=$P(\frac {1}{n}\cdot\sum_{i=1}^{n}(Z_i)\ge\varepsilon)$
$\Rightarrow P(\sum_{i=1}^{n}(X_i-E\lbrack X_i\rbrack)\ge n\cdot\varepsilon)$=$P(\sum_{i=1}^{n}(Z_i)\ge n\cdot\varepsilon)$
➁take $t$=$n\cdot\varepsilon$, then we have
$P(\sum_{i=1}^{n}(Z_i)\ge n\cdot\varepsilon)$
=$P(\sum_{i=1}^{n}(Z_i)\ge t)$
=$P(exp(\sum_{i=1}^{n}(s\cdot Z_i))\ge exp(s\cdot t))$
$\le E\lbrack (\prod_{i=1}^{n}exp(s\cdot Z_i))\rbrack\cdot exp(-s\cdot t)$
=$\prod_{i=1}^{n} E\lbrack exp(s\cdot Z_i)\rbrack\cdot exp(-s\cdot t)$
$\le exp(-s\cdot t)\cdot\prod_{i=1}^{n} exp(\frac {s^2\cdot (b_i-a_i)^2}{8})$
=$exp(\sum_{i=1}^{n}\frac {s^2\cdot (b_i-a_i)^2}{8}-s\cdot t)$
➂it’s equivalent to find the minimum of the term, the formal procedure would be to take the first derivative with respect to $s$, set it to $0$, solve $s$ in terms of $t$, since $s$ is a further expanded term in the proof process, we should eliminate $s$.
After that we find $s$=$\frac {4\cdot t}{\sum_{i=1}^{n}(b_i-a_i)^2}$ is the critical point.
Therefore,$P(\sum_{i=1}^{n}(Z_i)\ge t)\le exp(\frac {-2\cdot t^2}{\sum_{i=1}^{n}(b_i-a_i)^2})$
Finally, replace $t$ with $n\cdot\varepsilon$, we have it proved that
$P(\sum_{i=1}^{n}(Z_i)\ge n\cdot\varepsilon)\le exp(\frac {-2\cdot n^2\cdot\varepsilon^2}{\sum_{i=1}^{n}(b_i-a_i)^2})$

If we are given all $Z_i\in\lbrack a,b\rbrack$, then the inequality would become
$P(\sum_{i=1}^{n}(Z_i)\ge n\cdot\varepsilon)\le exp(\frac {-2\cdot n\cdot\varepsilon^2}{(b-a)^2})$

By symmetry, we have it that
$P(\vert\sum_{i=1}^{n}(Z_i)\vert\ge n\cdot\varepsilon)\le$ 2$\cdot exp(\frac {-2\cdot n\cdot\varepsilon^2}{(b-a)^2})$
$\Rightarrow P(\vert\sum_{i=1}^{n}(X_i-E\lbrack X_i\rbrack)\vert\ge n\cdot\varepsilon)\le$ 2$\cdot exp(\frac {-2\cdot n\cdot\varepsilon^2}{(b-a)^2})$