mjtsai1974's Dev Blog Welcome to mjt's AI world

Jensen's Inequality

Prologue To The Jensen's Inequality

The Jensen's inequality is an important inequality in the proof of many famous lemmas, theorems, it reveals that equality, $E\lbrack g(X)\rbrack$=$g(E\lbrack X\rbrack)$ rarely occur for nonlinear function g. Without actually computing the distribution of $g(X)$, we can easily relate $E\lbrack g(X)\rbrack$ to $g(E\lbrack X\rbrack)$.

Jensen's Inequality

Let $g$ be a convex function, and let $X$ be any random variable, then
$\;\;g(E\lbrack X\rbrack)\le E\lbrack g(X)\rbrack$

➀why focus on the convex function?
Be recalled that the second derivative of convex function is positive, that is $g″(X)\ge 0$. The curve would be in a bowl shape.
➁suppose $X$=$\{e_1,e_2\}$ is a random variable, containing 2 events with event $e_1$ the probability $\frac {4}{7}$, and event $e_2$ the probability $\frac {3}{7}$.
You can treat the 2 events as the inputs.

Convexity of $g$ forces all line segments connecting 2 points on the curve lie above the part of the curve segment in between.

➂if we choose the line ranging from $(a,g(a))$ to $b,g(b)$, then
$(E\lbrack X\rbrack,E\lbrack g(X)\rbrack)$
=$(\frac {4}{7}\cdot a+\frac {3}{7}\cdot b,\frac {4}{7}\cdot g(a)+\frac {3}{7}\cdot g(b))$
=$\frac {4}{7}\cdot (a,g(a))+\frac {3}{7}\cdot (b,g(b))$
This point $(E\lbrack X\rbrack,E\lbrack g(X)\rbrack)$ must lie above the point $(E\lbrack X\rbrack,g(E\lbrack X\rbrack))$.
Therefore, we have proved $g(E\lbrack X\rbrack)\le E\lbrack g(X)\rbrack$.

Chernoff Bounds

Prologue To The Chernoff Bounds

The Chernoff bounds is essential another variation based on the Markov inequality, it derivates the exponential deviation bounds.

Chernoff Bounds

Let $Z$ be any random variable, then for any $t>0$,
$\;\;P(Z\ge E\lbrack Z\rbrack+t)\le\underset{\lambda\ge 0}{min}E\lbrack e^{\lambda\cdot (Z-E\lbrack Z\rbrack)}\rbrack\cdot e^{-\lambda\cdot t}$=$\underset{\lambda\ge 0}{min}M_{Z-E\lbrack Z\rbrack}(\lambda)\cdot e^{-\lambda\cdot t}$
and
$\;\;P(Z\le E\lbrack Z\rbrack-t)\le\underset{\lambda\ge 0}{min}E\lbrack e^{\lambda\cdot (E\lbrack Z\rbrack-Z)}\rbrack\cdot e^{-\lambda\cdot t}$=$\underset{\lambda\ge 0}{min}M_{E\lbrack Z\rbrack-Z}(\lambda)\cdot e^{-\lambda\cdot t}$
; where $M_{Z}(\lambda)$=$E\lbrack exp(\lambda\cdot Z)\rbrack$ is the moment generating function of $Z$.

Proof:
➀for any $\lambda>0$, we have:
$Z\ge E\lbrack Z\rbrack+t$
$\Leftrightarrow e^{\lambda\cdot Z}\ge e^{\lambda\cdot (E\lbrack Z\rbrack+t)}$
Or equivalently,
$Z-E\lbrack Z\rbrack\ge t$
$\Leftrightarrow e^{\lambda\cdot (Z-E\lbrack Z\rbrack)}\ge e^{\lambda\cdot t}$
➁by Markov inequality, below holds:
$P(\left|Z-E\lbrack Z\rbrack\right|\ge t)$
$=P(e^{\lambda\cdot (Z-E\lbrack Z\rbrack)}\ge e^{\lambda\cdot t})\le E\lbrack e^{\lambda\cdot (Z-E\lbrack Z\rbrack)}\rbrack\cdot e^{-\lambda\cdot t}$
➂the second inequality could be proved in the similar way.

To be believed that a reasonable upper bound could be built by choosing appropriate $\lambda$.

Chernoff Bounds Extension

We can further extend Chernoff bounds with summation of multiple random varaibles.
➀suppose that all $Z_i$’s are independent and zero meanfor all $i$, then we have it that:
$\;\;M_{Z_1+…+Z_n}=\prod_{i=1}^{n}M_{Z_i}(\lambda)$
, which is a consequence of the moment generating function, because the independence of each $Z_{i}$.
$E\lbrack exp(\lambda\cdot\sum_{i=1}^{n}Z_{i})\rbrack$
=$E\lbrack \prod_{i=1}^{n}exp(\lambda\cdot Z_{i})\rbrack$
=$\prod_{i=1}^{n}E\lbrack exp(\lambda\cdot Z_{i})\rbrack$
➁by Chernoff inequality, for any $\lambda>0$,
$P(\sum_{i=1}^{n}Z_{i}\ge t)\le \prod_{i=1}^{n}E\lbrack exp(\lambda\cdot Z_{i})\rbrack\cdot e^{-\lambda\cdot t}$
$\Leftrightarrow P(\sum_{i=1}^{n}Z_{i}\ge t)\le (E\lbrack exp(\lambda\cdot Z_{1})\rbrack)^{n}\cdot e^{-\lambda\cdot t}$

Chernoff Bounds For Rademacher Random Variable

Prologue To The Chernoff Bounds For Rademacher Random Variable

The Chernoff bounds is a convenient way to build the shaper bounds in exponential form for the Rademacher random variable. It could efficiently relate the sample size to the error bias fits within the desired probability.

The Rademacher Random Variable

It is also called the random sign variable. Let S be a Rademacher random variable, its events are given $\{1,-1\}$, each with probability $\frac {1}{2}$.

Then, it has the basic property that $E\lbrack S^{k}\rbrack$=$0$, for $k$ is odd, while $E\lbrack S^{k}\rbrack$=$1$, for $k$ is even.
➀$E\lbrack S^{1}\rbrack$=$\frac {1}{2}\cdot 1$+$\frac {1}{2}\cdot (-1)$=$0$
➁$E\lbrack S^{3}\rbrack$=$\frac {1}{2}\cdot 1^{3}$+$\frac {1}{2}\cdot (-1)^{3}$=$0$
➂$E\lbrack S^{2}\rbrack$=$\frac {1}{2}\cdot 1^{2}$+$\frac {1}{2}\cdot (-1)^{2}$=$1$
➃$E\lbrack S^{4}\rbrack$=$\frac {1}{2}\cdot 1^{4}$+$\frac {1}{2}\cdot (-1)^{4}$=$1$

Chernoff Bounds For Rademacher Random Variable

Given $S$ is a Rademacher random variable, then
$\;\;P(Z\ge t)\le exp(\frac {n\cdot\lambda^{2}}{2})\cdot e^{-\lambda\cdot t}$
, where $Z$=$\sum_{i=1}^{n}S_{i}$
Proof:
➀begin by Taylor series of exponential function.
$E\lbrack e^{\lambda\cdot S}\rbrack$
=$E\lbrack\sum_{k=0}^{\infty}\frac {(\lambda\cdot S)^{k}}{k!}\rbrack$
=$\sum_{k=0}^{\infty}\frac {\lambda^{k}\cdot E\lbrack S^{k}\rbrack}{k!}$
=$\sum_{k=0,2,4,…}^{\infty}\frac {\lambda^{k}}{k!}$
=$\sum_{k=0}^{\infty}\frac {\lambda^{2\cdot k}}{2\cdot k!}$
➁next to find the closest upper bound, specifically, in exponential form.
since $2\cdot k!\ge 2^{k}\cdot k!$, we have
$E\lbrack e^{\lambda\cdot S}\rbrack$
=$\sum_{k=0}^{\infty}\frac {\lambda^{2\cdot k}}{2\cdot k!}$
$\le \sum_{k=0}^{\infty}\frac {\lambda^{2\cdot k}}{2^{k}\cdot k!}$
=$\sum_{k=0}^{\infty}(\frac {\lambda^{2}}{2})^{k}/k!$
=$e^{(\frac {\lambda^{2}}{2})}$
➂by given $Z$=$\sum_{i=1}^{n}S_{i}$, we have
$P(Z\ge t)$
$\le \frac {E\lbrack Z\rbrack}{t}$…Markov inequality
=$\frac {E\lbrack\sum_{i=1}^{n}S_{i}\rbrack}{t}$
=$\frac {E\lbrack e^{\lambda\cdot\sum_{i=1}^{n}S_{i}}\rbrack}{e^{\lambda\cdot t}}$……Chernoff bounds
=$\frac {E^{n}\lbrack e^{\lambda\cdot S_{1}}\rbrack}{e^{\lambda\cdot t}}$
$\le (e^{(\frac {\lambda^{2}}{2})})^{n}\cdot e^{-\lambda\cdot t}$
$\Rightarrow P(Z\ge t)\le e^{(\frac {n\cdot\lambda^{2}}{2})}\cdot e^{-\lambda\cdot t}$

Chernoff Bounds Extension

In this section, we’d like to inspect probability, sample size and the desired bias.
$P(Z\ge t)\le e^{(\frac {n\cdot\lambda^{2}}{2})}\cdot e^{-\lambda\cdot t}$
$\Rightarrow P(Z\ge t)\le e^{(\frac {n\cdot\lambda^{2}}{2})-\lambda\cdot t}$

We can get the minimum upper bound by finding some $\lambda$, which is equivalent to finding $\underset{\lambda\ge 0}{min}\{\frac{n\cdot\lambda^2}2-\lambda\cdot t\}$
➀taking derivatives with respect to $\lambda$, and set it to $0$, we get $\lambda$=$\frac {t}{n}$.
➁replace $\lambda$ with $\frac {t}{n}$ in inequality, we have
$P(Z\ge t)\le e^{-\frac {t^{2}}{2\cdot n}}$
➂suppose you’d like to constraint the testing result, the quantity from random variable $Z$, bigger than the bias, denote it $t$, with the error probability as a whole, limited within $\delta$, we can express it below: $P(Z\ge t)\le\delta$, where $\delta$=$e^{-\frac {t^{2}}{2\cdot n}}$
Then, $t$=$\sqrt {2\cdot n\cdot ln\frac {1}{\delta}}$
Finally, we can get the appropriate data size in terms of bias, $t$ and the error probability, say it $\delta$.

Cautions must be made that such extension holds only on event containing true and false. Other articles in my dev blog would inspect the form with respect to events of multiple results(more than 2 results).

Chernoff Bounds For Normal Distribution

Prologue To The Chernoff Bounds For Normal Distribution

The Chernoff bounds is a convenient way to build the shaper bounds in exponential form for normal distributions.

Chernoff Bounds For Normal Distribution

Given that $Z$ is a random variable, normally distributed in $Z\sim N(0,\sigma^2)$, then
$\;\;P(Z\ge t)\le e^{(\lambda\cdot\sigma)^{2}}\cdot e^{-\lambda\cdot t}$
, where $M_{Z}(\lambda)$=$e^{(\lambda\cdot\sigma)^{2}}$

proof::mjtsai1974

The proof would be categorized into 4 major parts, the first part is to ask for $E\lbrack e^{\lambda\cdot Z}\rbrack$, the second part is to figure out the r-th moment, we’ll make reference to the ratio test theorem in the third part, finally is the upper bounds in the Chernoff Bounds form.
[1]We begin from $E\lbrack e^{\lambda\cdot Z}\rbrack$
First, express the MGF of Z in terms of the Taylor series:
$M_{Z}(\lambda)$
=$E\lbrack e^{\lambda\cdot Z}\rbrack$
=$E\lbrack 1+\frac {\lambda\cdot Z}{1!}+\frac {(\lambda\cdot Z)^{2}}{2!}+\frac {(\lambda\cdot Z)^{3}}{3!}+…\rbrack$
=$1$+$\frac {\lambda}{1!}\cdot E\lbrack Z\rbrack$+$\frac {(\lambda)^{2}}{2!}\cdot E\lbrack Z^{2}\rbrack$+$\frac {(\lambda)^{3}}{3!}\cdot E\lbrack Z^{3}\rbrack$+$\frac {(\lambda)^{4}}{4!}\cdot E\lbrack Z^{4}\rbrack$+…
[2]Next to ask for the r-th ordinary moment of each term.
➀$E\lbrack Z^{r}\rbrack$=$\int_{0}^{\infty}\frac {z^{r}}{\sqrt {2\cdot\pi}}\cdot e^{-\frac {z^{2}}{2\cdot\sigma^{2}}}\operatorname dz$
; where $z\in Z$, $z$ is the events.
➁take $x$=$\frac {z^{2}}{2\cdot\sigma^{2}}$, then
$z$=$\sqrt{2\cdot\sigma^{2}\cdot x}$, and
$\operatorname dz$=$\frac {\sigma^{2}}{z}\cdot\operatorname dx$
➂just replace $\operatorname dz$ with $\operatorname dx$
$E\lbrack Z^{r}\rbrack$
=$\frac {1}{\sqrt {2\cdot\pi}}\int_{0}^{\infty}z^{r}\cdot e^{-x}\cdot\frac {\sigma^{2}}{z}\cdot\operatorname dx$
=$\frac {\sigma^{2}}{\sqrt {2\cdot\pi}}\int_{0}^{\infty}z^{r-1}\cdot e^{-x}\operatorname dx$
=$\frac {\sigma^{2}}{\sqrt {2\cdot\pi}}\cdot\Gamma(r)$
, where $\Gamma(r)$=$\int_{0}^{\infty}z^{r-1}\cdot e^{-x}\operatorname dx$, it is just the Gamma function in $Z$.
[3]We know $\Gamma(n)$=$(n-1)!$ in Introduction To The Gamma Distribution, then,
$M_{Z}(\lambda)$
=$1$+$\frac {\lambda}{1!}\cdot E\lbrack Z\rbrack$+$\frac {(\lambda)^{2}}{2!}\cdot E\lbrack Z^{2}\rbrack$+$\frac {(\lambda)^{3}}{3!}\cdot E\lbrack Z^{3}\rbrack$+$\frac {(\lambda)^{4}}{4!}\cdot E\lbrack Z^{4}\rbrack$+…
=$1$+$\frac {\lambda}{1!}\cdot\frac {\sigma^{2}}{\sqrt {2\cdot\pi}}\cdot\Gamma(1)$+$\frac {(\lambda)^{2}}{2!}\cdot\frac {\sigma^{2}}{\sqrt {2\cdot\pi}}\cdot\Gamma(2)$+$\frac {(\lambda)^{3}}{3!}\cdot\frac {\sigma^{2}}{\sqrt {2\cdot\pi}}\cdot\Gamma(3)$+$\frac {(\lambda)^{4}}{4!}\cdot\frac {\sigma^{2}}{\sqrt {2\cdot\pi}}\cdot\Gamma(4)$+…
=$1$+$0$+$\frac {(\lambda)^{2}}{2!}\cdot\frac {\sigma^{2}}{\sqrt {2\cdot\pi}}\cdot 1!$+$\frac {(\lambda)^{3}}{3!}\cdot\frac {\sigma^{2}}{\sqrt {2\cdot\pi}}\cdot 2!$+$\frac {(\lambda)^{4}}{4!}\cdot\frac {\sigma^{2}}{\sqrt {2\cdot\pi}}\cdot 3!$+…
…$\Gamma(1)$=$(1-1)!$=$0$
=$1$+$\frac {(\lambda)^{2}}{2}\cdot\frac {\sigma^{2}}{\sqrt {2\cdot\pi}}$+$\frac {(\lambda)^{3}}{3}\cdot\frac {\sigma^{2}}{\sqrt {2\cdot\pi}}$+$\frac {(\lambda)^{4}}{4}\cdot\frac {\sigma^{2}}{\sqrt {2\cdot\pi}}$+$\frac {(\lambda)^{5}}{5}\cdot\frac {\sigma^{2}}{\sqrt {2\cdot\pi}}$…
=$1$+$(\lambda\cdot \sigma)^{2}\cdot(\frac {1}{2\cdot\sqrt{2\cdot\pi}}$+$\frac {\lambda}{3\cdot\sqrt{2\cdot\pi}}$+$\frac {\lambda^{2}}{4\cdot\sqrt{2\cdot\pi}}$+$\frac {\lambda^{3}}{5\cdot\sqrt{2\cdot\pi}}$+…+$)$

Be recalled that we have convergence evaluation by the ratio test theorem in Series Convergence.

If $\frac {a_{n+1}}{a_n}$ approaches a limit $L<1$, the series converges.

➀$\frac {a_2}{a_1}$=$\frac {\lambda^{3}/3}{\lambda^{2}/2}$=$\frac {2}{3}\cdot\lambda$
➁$\frac {a_3}{a_2}$=$\frac {\lambda^{4}/4}{\lambda^{3}/3}$=$\frac {3}{4}\cdot\lambda$
➂$\frac {a_4}{a_3}$=$\frac {\lambda^{5}/5}{\lambda^{4}/4}$=$\frac {4}{5}\cdot\lambda$
Trivially, $\frac {a_2}{a_1}$<$\frac {a_3}{a_2}$<$\frac {a_4}{a_3}$<…$\rightarrow 1$, the ratio is increasing and becomes much much close to $1$, this series would just diverge to infinity, by ratio test theorem.
Therefore, the whole equality becomes
$M_{Z}(\lambda)$
=$1$+$(\lambda\cdot\sigma)^{2}\cdot(\frac {1}{2\cdot\sqrt{2\cdot\pi}}$+$\frac {\lambda}{3\cdot\sqrt{2\cdot\pi}}$+$\frac {\lambda^{2}}{4\cdot\sqrt{2\cdot\pi}}$+$\frac {\lambda^{3}}{5\cdot\sqrt{2\cdot\pi}}$+…+$)$
$\approx 1+(\lambda\cdot\sigma)^{2}\cdot\infty$
$\approx\lim_{h\rightarrow 0}1+\frac {(\lambda\cdot\sigma)^{2}}{h}$
$\approx\lim_{h\rightarrow\infty}(1+\frac {(\lambda\cdot\sigma)^{2}}{h})^{h}$
=$e^{(\lambda\cdot\sigma)^{2}}$
[4]By Chernoff Bounds, for any $\lambda>0$, then
$P(Z\ge t)\le \frac {E\lbrack e^{(\lambda\cdot Z)}\rbrack}{e^{\lambda\cdot t}}$
$\Rightarrow P(Z\ge t)\le e^{(\lambda\cdot\sigma)^{2}}\cdot e^{-\lambda\cdot t}$

Chebyshev's Inequality

Prologue To The Chebyshev's Inequality

The Chebyshev's inequality is the variation based on the Markov inequality, it uses the second moments, the variance instead of the mean of a random variable.

Chebyshev's Inequality

Let $Z$ be any random variable with $Var\lbrack Z\rbrack>\infty$, then
$\;\;\;\;P(Z\ge E\lbrack Z\rbrack+t\;or\;Z\le E\lbrack Z\rbrack-t)\le\frac {Var\lbrack Z\rbrack}{t^2}$
$\;\;\;\;\;\;$, for any $t\ge 0$

Proof:
➀$Z\ge E\lbrack Z\rbrack+t$ or $Z\le E\lbrack Z\rbrack-t$ is exactly the same as $\left|Z-E\lbrack Z\rbrack\right|\ge t$.
➁therefore, we have it that:
$P(Z\ge E\lbrack Z\rbrack+t\;or\;Z\le E\lbrack Z\rbrack-t)$
=$P(\left|Z-E\lbrack Z\rbrack\right|\ge t)$
=$P((Z-E\lbrack Z\rbrack)^2\ge t^2)$
➂by means of the Markov inequality, below inequality just holds.
$P((Z-E\lbrack Z\rbrack)^2\ge t^2)\le\frac {(Z-E\lbrack Z\rbrack)^2}{t^2}$
$\Rightarrow P(Z\ge E\lbrack Z\rbrack+t\;or\;Z\le E\lbrack Z\rbrack-t)\le\frac {Var\lbrack Z\rbrack}{t^2}$

Remember that we have a glance over the Chebyshev's inequality in Hoeffding Inequality v.s. Chebyshev’s Inequality, it has been proved by means of integral, now, distinct topic in the series of probability bounds would like to be chained together, said by using the Markov inequality in this proof.