Hoeffding Bounds
08 Mar 2018Prologue To The Hoeffding Bounds
The Mjtsai1974 Light Weight Upper Bound
Before stepping into the major topic, make ourself a tiny break in a corner with a rather simple inequality, constrain still the same random variable as Hoeffding lemma. It’s just the trivial finding during my proof in these Hoeffding bounds. I nicknamed it the Mjtsai1974 Light Weight Upper Bound.
Given $X\in\lbrack a,b\rbrack$, it’s a random variable with $E\lbrack x\rbrack$=$0$, then for any $s>0$, we have $E\lbrack e^{s\cdot X}\rbrack\le e^{s\cdot(a-b)}$
proof::mjtsai1974➀by given $X\in\lbrack a,b\rbrack$, take $\lambda$=$\frac {x-a}{b-a}$,
we just have $x$=$(1-\lambda)\cdot a$+$\lambda\cdot b$.
➁since $g(E\lbrack s\cdot X\rbrack)\le E\lbrack g(s\cdot X)\rbrack$ by Jensen's inequality for any convex function $g(X)$=$e^{X}$, then
$e^{(1-\lambda)\cdot s\cdot a+\lambda\cdot s\cdot b}\le (1-\lambda)\cdot e^{s\cdot a}+\lambda\cdot e^{s\cdot b}$
$\Rightarrow E\lbrack e^{(1-\lambda)\cdot s\cdot a+\lambda\cdot s\cdot b}\rbrack\le E\lbrack (1-\lambda)\cdot e^{s\cdot a}+\lambda\cdot e^{s\cdot b}\rbrack$
➂that is to say, for any $x\in X$, we have
$E\lbrack e^{s\cdot x}\rbrack$
$\le E\lbrack (1-\lambda)\cdot e^{s\cdot a}+\lambda\cdot e^{s\cdot b}\rbrack$
=$E\lbrack (\lambda+(1-\lambda)\cdot e^{s\cdot a-s\cdot b})\cdot e^{s\cdot b}\rbrack$
=$E\lbrack (\frac {x-a}{b-a}+(\frac {b-x}{b-a})\cdot e^{s\cdot a-s\cdot b})\cdot e^{s\cdot b}\rbrack$
=$(\frac {-a}{b-a}+(\frac {b}{b-a})\cdot e^{s\cdot (a-b)})\cdot e^{s\cdot b}$…$E\lbrack x\rbrack$=$0$
, which is equivalent to ask for its approximation.
➃take $p$=$\frac {-a}{b-a}$, then $1-p$=$\frac {b}{b-a}$. From the two equalities, we can deduce out two expressions for $b$:
$b$=$\frac {a\cdot (p-1)}{p}$ and $b$=$(1-p)\cdot (b-a)$
Choose $b$=$(1-p)\cdot (b-a)$ to express $b$, then above inequality becomes
$E\lbrack e^{s\cdot x}\rbrack$
$\le (\frac {-a}{b-a}+(\frac {b}{b-a})\cdot e^{s\cdot (a-b)})\cdot e^{s\cdot b}$
=$((1-p)\cdot e^{s\cdot (a-b)}+p)\cdot e^{s\cdot (1-p)\cdot (b-a)}$
=$((1-p)\cdot e^{s\cdot (a-b)}+p)\cdot e^{s\cdot (p-1)\cdot (a-b)}$
The reason we choose such $b$ expression is to simplify the inequality.
➄take $u$=$a-b$,$\theta(u)$=$((1-p)\cdot e^{s\cdot u}+p)\cdot e^{s\cdot (p-1)\cdot u}$, then
$ln\theta(u)$
=$ln((1-p)\cdot e^{s\cdot u}+p)$+$s\cdot (p-1)\cdot u$
$\le ln((1-p)\cdot e^{s\cdot u}+p)$…since $p-1\le 0$
$\le ln((1-p)\cdot e^{s\cdot u})$…since $p\le 1$
$\le ln(e^{s\cdot u})$
=$s\cdot u$
, then $\theta(u)\le e^{s\cdot u}$
Therefore, $E\lbrack e^{s\cdot X}\rbrack\le e^{s\cdot(a-b)}$ is thus proved, this is a simple, light weight inequality, I denote it Mjtsai1974 Light Weight Upper Bound.
The Hoeffding Lemma
Given $X\in\lbrack a,b\rbrack$, it’s a random variable with $E\lbrack x\rbrack$=$0$, then for any $s>0$, we have $E\lbrack e^{s\cdot X}\rbrack\le e^{\frac {s^2\cdot(b-a)^2}{8}}$
proof::mjtsai1974We would inherit from ➄ in the proof of Mjtsai1974 Light Weight Upper Bound, begin from $ln\theta(u)$=$ln((1-p)\cdot e^{s\cdot u}+p)$+$s\cdot (p-1)\cdot u$, where $u$=$a-b$,$\theta(u)$=$((1-p)\cdot e^{s\cdot u}+p)\cdot e^{s\cdot (p-1)\cdot u}$.
➀take $M(u)$=$ln\theta(u)$, by Taylor theorem, for $w\in\lbrack u,0\rbrack$, where in this case $u\le 0$, we have
$\lim_{w\rightarrow u}M(w)$
$\approx ln\theta(u)$+$\frac {ln^{′}\theta(u)}{1!}\cdot (w-u)$+$\frac {ln^{″}\theta(u)}{2!}\cdot (w-u)^2$+$\frac {ln^{′″}\theta(u)}{3!}\cdot (w-u)^3$+…
$\le ln\theta(0)$+$\frac {ln^{′}\theta(0)}{1!}\cdot (0-u)$+$\frac {ln^{″}\theta(0)}{2!}\cdot (0-u)^2$+$\frac {ln^{′″}\theta(0)}{3!}\cdot (0-u)^3$+…
➁from above, we must make first, second derivatives, they are
$ln^{′}\theta(u)$=$(p+(1-p)\cdot e^{s\cdot u})^{-1}\cdot ((1-p)\cdot s\cdot e^{s\cdot u})$+$s\cdot(p-1)$
$ln^{″}\theta(u)$=$-1\cdot (p+(1-p)\cdot e^{s\cdot u})^{-2}\cdot ((1-p)\cdot s\cdot e^{s\cdot u})^{2}$
$\;\;\;\;\;\;\;\;+(p+(1-p)\cdot e^{s\cdot u})^{-1}\cdot ((1-p)\cdot s^2\cdot e^{s\cdot u})$
And, $ln\theta(0)$=$0$, $ln^{′}\theta(0)$=$0$,
$ln^{″}\theta(0)$=$-1\cdot((1-p)\cdot s)^2$+$(1-p)\cdot s^2$=$p\cdot (1-p)\cdot s^2$
➂therefore, we’d like to maximum the term $ln^{″}\theta(u)$,
take $z$=$e^{s\cdot u}$, $s$=$\frac {ln(z)}{u}$, then
$ln^{″}\theta(u)$…further regularized
=$\frac {p\cdot (1-p)\cdot s^2\cdot e^{s\cdot u}}{(p+(1-p)\cdot e^{s\cdot u})^{2}}$
=$\frac {p\cdot (1-p)\cdot (\frac {ln(z)}{u})^2\cdot z}{(p+(1-p)\cdot z)^{2}}$
$\le \frac {p\cdot (1-p)\cdot z}{(p+(1-p)\cdot z)^{2}}$…$(\frac {ln(z)}{u})^2\le 1$
➃take $N(z)$=$\frac {p\cdot (1-p)\cdot z}{(p+(1-p)\cdot z)^{2}}$, and $ln^{″}\theta(u)$=$s^2\cdot N(z)$, this is equivalent to ask for the maximum of $N(z)$, the formal procedure would be to take its first derivative and set it to $0$, next to get the possible $z$ expressed in terms of $p$, in this case, $z$ is the input parameter in $N(z)$.
$N′(z)$=$\frac {p^{2}\cdot (1-p)-p\cdot (1-p)^{2}\cdot z}{(p+(1-p)\cdot z)^{3}}$=$0$
After we solve it, we get $z$=$\frac {p}{1-p}$ is the critical point.
➄take $z$=$\frac {p}{1-p}$ in $N(z)$
$N(z)$=$\frac {p\cdot (1-p)\cdot\frac {p}{1-p}}{(p+(1-p)\cdot\frac {p}{1-p})^{2}}$=$\frac {p^{2}}{4\cdot p^{2}}$=$\frac {1}{4}$
Be recalled that $ln^{″}\theta(u)$=$s^2\cdot N(z)$, back to the Taylor expansion in ➁, we have
$ln\theta(u)$
=$M(u)$
$\le \frac {ln^{″}\theta(0)}{2!}\cdot(u)^{2}$
=$\frac {s^{2}\cdot N(z)}{2!}\cdot (u)^2$
$\le \frac {1}{2!}\cdot s^{2}\cdot\frac {1}{4}\cdot u^{2}$
=$\frac {1}{8}\cdot s^{2}\cdot u^{2}$
Therefore, $\theta(u)\le e^{\frac {1}{8}\cdot s^{2}\cdot u^{2}}$
➅finally, we just proved that
$E\lbrack e^{s\cdot X}\rbrack\le \theta(u)\le e^{\frac {1}{8}\cdot s^{2}\cdot u^{2}}$=$e^{\frac {s^{2}\cdot (a-b)^{2}}{8}}$=$e^{\frac {s^{2}\cdot (b-a)^{2}}{8}}$
The Hoeffding Inequality
Given that $X_1$,$X_2$,,,$X_n$ are independent random variables with $a_i\le X_i\le b_i$, then
$\;\;\;\;P(\frac {1}{n}\cdot\sum_{1}^{n}(X_i-E\lbrack X_i\rbrack)\ge\varepsilon)\le exp(\frac {-2\cdot n^2\cdot\varepsilon^2}{\sum_{1}^{n}(b_i-a_i)^2})$Proof:
We’d like to prove it by Hoeffding lemma.
➀take $Z_i$=$X_i-E\lbrack X_i\rbrack$, for all $i$, then $Z_i\in\lbrack a_i,b_i\rbrack$ and $E\lbrack Z_i\rbrack$=$0$ just holds.
$P(\frac {1}{n}\cdot\sum_{i=1}^{n}(X_i-E\lbrack X_i\rbrack))$=$P(\frac {1}{n}\cdot\sum_{i=1}^{n}(Z_i))$, then
$P(\frac {1}{n}\cdot\sum_{i=1}^{n}(X_i-E\lbrack X_i\rbrack)\ge\varepsilon)$=$P(\frac {1}{n}\cdot\sum_{i=1}^{n}(Z_i)\ge\varepsilon)$
$\Rightarrow P(\sum_{i=1}^{n}(X_i-E\lbrack X_i\rbrack)\ge n\cdot\varepsilon)$=$P(\sum_{i=1}^{n}(Z_i)\ge n\cdot\varepsilon)$
➁take $t$=$n\cdot\varepsilon$, then we have
$P(\sum_{i=1}^{n}(Z_i)\ge n\cdot\varepsilon)$
=$P(\sum_{i=1}^{n}(Z_i)\ge t)$
=$P(exp(\sum_{i=1}^{n}(s\cdot Z_i))\ge exp(s\cdot t))$
$\le E\lbrack (\prod_{i=1}^{n}exp(s\cdot Z_i))\rbrack\cdot exp(-s\cdot t)$
=$\prod_{i=1}^{n} E\lbrack exp(s\cdot Z_i)\rbrack\cdot exp(-s\cdot t)$
$\le exp(-s\cdot t)\cdot\prod_{i=1}^{n} exp(\frac {s^2\cdot (b_i-a_i)^2}{8})$
=$exp(\sum_{i=1}^{n}\frac {s^2\cdot (b_i-a_i)^2}{8}-s\cdot t)$
➂it’s equivalent to find the minimum of the term, the formal procedure would be to take the first derivative with respect to $s$, set it to $0$, solve $s$ in terms of $t$, since $s$ is a further expanded term in the proof process, we should eliminate $s$.
After that we find $s$=$\frac {4\cdot t}{\sum_{i=1}^{n}(b_i-a_i)^2}$ is the critical point.
Therefore,$P(\sum_{i=1}^{n}(Z_i)\ge t)\le exp(\frac {-2\cdot t^2}{\sum_{i=1}^{n}(b_i-a_i)^2})$
Finally, replace $t$ with $n\cdot\varepsilon$, we have it proved that
$P(\sum_{i=1}^{n}(Z_i)\ge n\cdot\varepsilon)\le exp(\frac {-2\cdot n^2\cdot\varepsilon^2}{\sum_{i=1}^{n}(b_i-a_i)^2})$If we are given all $Z_i\in\lbrack a,b\rbrack$, then the inequality would become
$P(\sum_{i=1}^{n}(Z_i)\ge n\cdot\varepsilon)\le exp(\frac {-2\cdot n\cdot\varepsilon^2}{(b-a)^2})$By symmetry, we have it that
$P(\vert\sum_{i=1}^{n}(Z_i)\vert\ge n\cdot\varepsilon)\le$ 2$\cdot exp(\frac {-2\cdot n\cdot\varepsilon^2}{(b-a)^2})$
$\Rightarrow P(\vert\sum_{i=1}^{n}(X_i-E\lbrack X_i\rbrack)\vert\ge n\cdot\varepsilon)\le$ 2$\cdot exp(\frac {-2\cdot n\cdot\varepsilon^2}{(b-a)^2})$