Chernoff Bounds For Arbitrary Random Variable

23 Dec 2019

Prologue To The Chernoff Bounds For Arbitrary Random Variable

There are many Chernoff bounds as a result. This article develops the tail bound on the independent arbitrary random variable with outcome ranging from 0 to 1, that is $\lbrack 0,1\rbrack$.

The Error Probability After Test

[The given condition]
Suppose we are given
➀$X_{1}$,$X_{2}$,…,$X_{n}$ to be independent random variables with values in $\lbrack 0,1\rbrack$.
➁$X$=$X_{1}$+$X_{2}$+…+$X_{n}$
➂$E\lbrack X \rbrack$=$\mu$

These $X_{1}$,…,$X_{n}$ needs not to be Bernoulli ranodm variables, but they must be independent.
[Question]
Then, for every $\varepsilon$>$0$, what is the upper and lower bound for
➀$P(X\geq (1+\delta)\cdot\mu)$=$P(X\geq \mu+\varepsilon)$
➁$P(X\leq (1-\delta)\cdot\mu)$=$P(X\leq \mu-\varepsilon)$
, where $\varepsilon$=$\delta\cdot\mu$

The Upper Bound On Error Probability

Still, it is asking the upper bound on error probability, when target value $X$ is greater than the expect value $\mu$, and we’d like to bound it on the upper side.

The major difference is that these random variables fall within $\lbrack 0,1\rbrack$, not the Bernoulli values $0$ or $1$. Here is the idea:
➀take $Y_{i}$=$X_{i}$-$E\lbrack X_{i} \rbrack$
➁take $X$=$\sum_{1}^{n}X_{i}$,$E\lbrack X \rbrack$=$\mu$=$\sum_{1}^{n}E\lbrack X_{i} \rbrack$
➂take $Y$=$\sum_{1}^{n}Y_{i}$
➃$P(X\geq\mu+\varepsilon)$
=$P(X-\mu\geq\varepsilon)$
=$P(\sum_{1}^{n}(X_{i}-E\lbrack X_{i} \rbrack)\geq\varepsilon)$
=$P(Y\geq\varepsilon)$
=$P(e^{t\cdot Y}\geq e^{t\cdot\varepsilon})$…for any $t>0$
$\leq\frac {E\lbrack e^{t\cdot Y}\rbrack}{e^{t\cdot\varepsilon}}$, and $E\lbrack Y_{i}\rbrack$=$0$

Further expand
$E\lbrack e^{t\cdot Y}\rbrack$
=$E\lbrack e^{\sum_{1}^{n} t\cdot Y_{i}}\rbrack$
=$E\lbrack \prod_{1}^{n} e^{t\cdot Y_{i}}\rbrack$
=$\prod_{1}^{n}E\lbrack e^{t\cdot Y_{i}}\rbrack$

Thus, $P(X\geq \mu+\varepsilon)\leq\frac {\prod_{1}^{n}E\lbrack e^{t\cdot Y_{i}}\rbrack}{e^{t\cdot\varepsilon}}$ is left to bound the term $E\lbrack e^{t\cdot Y_{i}}\rbrack$.
➀$D_{y}e^{t\cdot Y_{i}}$=$t\cdot e^{t\cdot Y_{i}}>0$
➁$D_{y}t\cdot e^{t\cdot Y_{i}}$=$t^{2}\cdot e^{t\cdot Y_{i}}>0$
$e^{t\cdot Y_{i}}$ is a convex function, it concaves up, by the 2nd derivative greater than 0.

Suppose there exists a line $a+b\cdot y$ passing through the curve of $e^{t\cdot Y_{i}}$ at the points $(1,e^{t})$ and $(-1,e^{-t})$, that is
$a$+$b$=$e^{t}$
$a$-$b$=$e^{-t}$
$\Rightarrow$solve above equation, could we obtain $a$=$\frac {e^{t}+e^{-t}}{2}$, $b$=$\frac {e^{t}-e^{-t}}{2}$

We can claim that within $\lbrack -1,1\rbrack$
$e^{t\cdot Y_{i}}\leq a+b\cdot Y_{i}$
$\Rightarrow E\lbrack e^{t\cdot Y_{i}}\rbrack\leq E\lbrack a+b\cdot Y_{i}\rbrack$
$\Rightarrow E\lbrack e^{t\cdot Y_{i}}\rbrack\leq a + b\cdot E\lbrack Y_{i}\rbrack$
$\Rightarrow E\lbrack e^{t\cdot Y_{i}}\rbrack\leq a + b\cdot 0$, where $E\lbrack Y_{i}\rbrack$=$0$
$\Rightarrow E\lbrack e^{t\cdot Y_{i}}\rbrack\leq a$
$\Rightarrow E\lbrack e^{t\cdot Y_{i}}\rbrack\leq \frac {e^{t}+e^{-t}}{2}$

By using Taylor expansion of $e^{t}$, we can simplify the bounding:
$e^{x}$=$1$+$\frac {x^{1}}{1!}$+$\frac {x^{2}}{2!}$+$\frac {x^{3}}{3!}$+…, then
$\frac {e^{t}+e^{-t}}{2}$
=$\frac {1}{2}\cdot\lbrack$+$1$+$\frac {t^{1}}{1!}$+$\frac {t^{2}}{2!}$+$\frac {t^{3}}{3!}$+$\frac {t^{4}}{4!}$+… +$1$+$\frac {(-t)^{1}}{1!}$+$\frac {(-t)^{2}}{2!}$+$\frac {(-t)^{3}}{3!}$+$\frac {(-t)^{4}}{4!}$+…+$\rbrack$
=$1$+$\frac {t^{2}}{2!}$+$\frac {t^{4}}{4!}$+$\frac {t^{6}}{6!}$+…
$\leq 1$+$\frac {t^{2}}{2\cdot 1!}$+$\frac {t^{4}}{2^{2}\cdot 2!}$+$\frac {t^{6}}{2^{3}\cdot 3!}$+…
$\leq 1$+$\frac {\frac {t^{2}}{2}}{1!}$+$\frac {(\frac {t^{2}}{2})^{2}}{2!}$+$\frac {(\frac {t^{2}}{2})^{3}}{3!}$+…
$\leq e^{\frac {t^{2}}{2}}$

Therefore, $e^{t\cdot Y_{i}}\leq e^{\frac {t^{2}}{2}}$,
$\prod_{1}^{n}E\lbrack e^{t\cdot Y_{i}}\rbrack$
=$E\lbrack e^{t\cdot (Y_{1}+Y_{2}+…+Y_{n})}\rbrack$
=$E\lbrack e^{n\cdot t\cdot Y_{i}}\rbrack$
$\leq E\lbrack e^{\frac {t^{2}}{2}}\rbrack$

Back to the inequality:
$P(X\geq \mu+\varepsilon)$
$\leq\frac {\prod_{1}^{n}E\lbrack e^{t\cdot Y_{i}}\rbrack}{e^{t\cdot\varepsilon}}$
$\leq\frac {E\lbrack e^{\frac {t^{2}}{2}}\rbrack}{e^{t\cdot\varepsilon}}$
$\leq e^{\frac {t^{2}}{2}-t\cdot\varepsilon}$

Next would be:
➀ask for the maximum value of $\frac {t^{2}}{2}-t\cdot\varepsilon$, take its 1st order differentiation, set it to 0, figure out such $t$:
$D_{t}(\frac {t^{2}}{2}-t\cdot\varepsilon)$=$0$
$\Rightarrow n\cdot t-\varepsilon$=$0$
$\Rightarrow t$=$\frac {\varepsilon}{n}$

➁take $t$=$\frac {\varepsilon}{n}$ in $e^{\frac {t^{2}}{2}-t\cdot\varepsilon}$, then
$e^{\frac {n\cdot \frac {\varepsilon^{2}}{n^{2}}}{2}-\frac {\varepsilon}{n}\cdot\varepsilon}$
=$e^{\frac {\varepsilon^{2}}{2n}-\frac {\varepsilon^{2}}{n}}$
=$e^{-\frac {\varepsilon^{2}}{2n}}$

Finally, we prove $P(X\geq \mu+\varepsilon)\leq e^{-\frac {\varepsilon^{2}}{2n}}$ in this upper bound for error probability.

The Lower Bound On Error Probability

It is asking the error probability when target value $X$ is less than the expect value $\mu$, and we’d like to bound it on the lower side.
$P(X\leq \mu-\varepsilon)$
=$P(X-\mu\leq -\varepsilon)$
=$P(-X+\mu\geq \varepsilon)$
=$P(-X\geq -\mu+\varepsilon)$

It is much similar to the upper bound expression, here is the idea again:
➀take $Y_{i}^{'}$=$E\lbrack X_{i} \rbrack$-$X_{i}$
➁take $X$=$\sum_{1}^{n}X_{i}$,$E\lbrack X \rbrack$=$\mu$=$\sum_{1}^{n}E\lbrack X_{i} \rbrack$
➂take $Y^{'}$=$\sum_{1}^{n}Y_{i}^{'}$
➃$P(X\leq \mu-\varepsilon)$
=$P(-X\geq -\mu+\varepsilon)$
=$P(Y^{'}\geq \varepsilon)$
=$P(e^{t\cdot Y^{'}}\geq e^{t\cdot\varepsilon})$…for any $t>0$
$\leq\frac {E\lbrack e^{t\cdot Y^{'}}\rbrack}{e^{t\cdot\varepsilon}}$

With similar deduction in upper bound would we obtain the same expression for lower bound:
$P(X\leq \mu-\varepsilon)\leq e^{-\frac {\varepsilon^{2}}{2n}}$

By symmetrical distribution, $P(X-\mu\leq -\varepsilon)$ is the opposite direction to $P(X-\mu\geq \varepsilon)$, its lower bound of error probability would be no more than $e^{\frac {t^{2}}{2}-t\cdot\varepsilon}$
$P(X\leq \mu-\varepsilon)$
=$P(X-\mu\leq -\varepsilon)$

Illustration: Chernoff bound For Bernoulli versus arbitrary Random Variables::mjtsai1974

[The major point] Why the proof in Chernoff bound for Bernoulli is inadequate in arbitrary random variables? Can we use the proof in Chernoff bound for Bernoulli in arbitrary random variable?
If not, why?

The Bernoulli random variable has only two values, $0$ and $1$, while arbitrary random variable has values falling withing $\lbrack 0,1\rbrack$. It seems that probabilistic error tolerance in arbitrary random variables requires much more precision concern in approximation, which would be shown in my next proof.
[1] The fault tolerance $\delta$ of upper and lower bound
From the proof in Chernoff Bounds For Bernoulli Random Variable, we see that
➀$P(X>(1+\delta)\cdot\mu)$<$e^{((e^{t}-1)-t\cdot (1+\delta))\cdot\mu}$
➁$P(X<(1-\delta)\cdot\mu)$<$e^{(-1-t\cdot\delta)\cdot\mu}$

Next would be to inspect the fault tolerance $\delta$ in Chernoff bounds for Bernoulli random variable for any given $t$>$0$, based on the convex property used in the proof of Chernoff bounds for arbitrary random variable.
[1.1] The fault tolerance $\delta$ of upper bound
➀$D_{t}e^{((e^{t}-1)-t\cdot (1+\delta))\cdot\mu}$
=$(e^{t}-(1+\delta))\cdot\mu\cdot e^{((e^{t}-1)-t\cdot (1+\delta))\cdot\mu}$
➁$D_{t}^{2}e^{((e^{t}-1)-t\cdot (1+\delta))\cdot\mu}$
=$(e^{t}\cdot\mu+(e^{t}-(1+\delta))^{2})\cdot e^{((e^{t}-1)-t\cdot (1+\delta))\cdot\mu}$

Since $D_{t}^{2}e^{((e^{t}-1)-t\cdot (1+\delta))\cdot\mu}\geq 0$ alswys holds, this upper bound of $e^{((e^{t}-1)-t\cdot (1+\delta))\cdot\mu}$ in Chernoff bounds for Bernoulli random variable is a convex function, which concaves up.

➂suppose $c$+$d\cdot t$ pass through $e^{((e^{t}-1)-t\cdot (1+\delta))\cdot\mu}$ at $(0,1)$ and $(1,e^{e-2-\delta})$…for $t$>$0$
$c$+$d\cdot 0$=$1$…$t$=$0$
$c$+$d$=$e^{e-2-\delta}$…$t$=$1$
$\Rightarrow c=1$ and $d$=$e^{e-2-\delta}$-$1$, thus it is
$1$+$(e^{e-2-\delta}-1)\cdot t$

➃we’d like to figure out fault tolerance of $\delta$ for any $t$>$0$, that is to express $t$ in terms of $\delta$:
$D_{t}(1+(e^{e-2-\delta}-1)\cdot t)$=$0$
$\Rightarrow e^{e-2-\delta}-1$=$0$
$\Rightarrow e^{e-2-\delta}$=$1$
$\Rightarrow e-2-\delta$=$0$
$\Rightarrow \delta$=$e-2$…$e$=$2.71828182…$
$\Rightarrow \delta$=$0.71828182…$

For whatever $0<t\leq 1$ we are using, the fault tolerance expressed by $\delta$ could be probabilistically up to almost $0.718$, might not be an inappropriate choice.
[1.2] The fault tolerance $\delta$ of lower bound
➀$D_{t}e^{(-1-t\cdot\delta)\cdot\mu}$=$-\delta\cdot\mu\cdot e^{(-1-t\cdot\delta)\cdot\mu}$
➁$D_{t}^{2}e^{(-1-t\cdot\delta)\cdot\mu}$=$(-\delta\cdot\mu)^{2}\cdot e^{(-1-t\cdot\delta)\cdot\mu}$

Since $(-\delta\cdot\mu)^{2}\cdot e^{(-1-t\cdot\delta)\cdot\mu}\geq 0$ always holds, this lower bound of $e^{(-1-t\cdot\delta)\cdot\mu}$ in Chernoff bounds for Bernoulli random variable is a convex function, which concaves up.

➂suppose $c$+$d\cdot t$ pass through $e^{(-1-t\cdot\delta)\cdot\mu}$ at $(0,e^{-\mu})$ and $(1,e^{(-1-\delta)\cdot\mu})$, then
$c$+$d\cdot 0$=$e^{-\mu}$…$t$=$0$
$c$+$d$=$e^{(-1-\delta)\cdot\mu}$
$\Rightarrow c$=$e^{-\mu}$ and $d$=$e^{(-1-\delta)\cdot\mu}$-$e^{-\mu}$, thus it is
$e^{-\mu}$+$(e^{(-1-\delta)\cdot\mu}-e^{-\mu})\cdot t$

➃we’d like to figure out fault tolerance of $\delta$ for any $t$>$0$, that is to express $t$ in terms of $\delta$:
$D_{t}(e^{-\mu}+(e^{(-1-\delta)\cdot\mu}-e^{-\mu})\cdot t)$=$0$
$\Rightarrow e^{(-1-\delta)\cdot\mu}-e^{-\mu}$=$0$
$\Rightarrow e^{(-1-\delta)\cdot\mu}$=$e^{-\mu}$
$\Rightarrow (-1-\delta)\cdot\mu$=$-\mu$
$\Rightarrow \delta$=$0$

For whatever $0<t\leq 1$ we are using, the fault tolerance expressed by $\delta$ could only be to $0$, might be an inappropriate choice in error probability evaluation.
[2] The existence of $t$ in upper and lower bound
Succeeding to above section, this article would like to ask for $t$, such that we can have an ideal upper/lower bound, for $\delta$ in $\lbrack -1,1\rbrack$:
➀$P(X>(1+\delta)\cdot\mu)$<$e^{((e^{t}-1)-t\cdot (1+\delta))\cdot\mu}$
➁$P(X<(1-\delta)\cdot\mu)$<$e^{(-1-t\cdot\delta)\cdot\mu}$
[2.1] The validity of $t$ in upper bound
➀$D_{\delta}^{2}e^{((e^{t}-1)-t\cdot (1+\delta))\cdot\mu}\geq 0$
$e^{((e^{t}-1)-t\cdot (1+\delta))\cdot\mu}$ is a convex function.

➁suppose $c$+$d\cdot\delta$ pass through $e^{((e^{t}-1)-t\cdot (1+\delta))\cdot\mu}$ at $(-1,e^{(e^{t}-1)\cdot\mu})$ and $(1,e^{(e^{t}-1-2\cdot t)\cdot\mu})$, then:
$c$=$\frac {e^{(e^{t}-1-2\cdot t)\cdot\mu}+e^{(e^{t}-1)\cdot\mu}}{2}$
$d$=$\frac {e^{(e^{t}-1-2\cdot t)\cdot\mu}-e^{(e^{t}-1)\cdot\mu}}{2}$

➂we’d like to deduct out the expression of $t$ for any $-1\leq\delta\leq 1$, that is to express $\delta$ in terms of $t$:
$D_{\delta}(c+d\cdot\delta)$=$0$
$\Rightarrow d$=$0$
$\Rightarrow \frac {e^{(e^{t}-1-2\cdot t)\cdot\mu}-e^{(e^{t}-1)\cdot\mu}}{2}$=$0$
$\Rightarrow e^{(e^{t}-1-2\cdot t)\cdot\mu}$=$e^{(e^{t}-1)\cdot\mu}$
$\Rightarrow (e^{t}-1-2\cdot t)\cdot\mu$=$(e^{t}-1)\cdot\mu$
$\Rightarrow e^{t}-1-2\cdot t$=$(e^{t}-1)$
$\Rightarrow t$=$0$, which is a contradiction, for $t>0$ is the must be condition in proof.
[2.2] The validity of $t$ in lower bound
➀$D_{\delta}^{2}e^{(-1-t\cdot\delta)\cdot\mu}\geq 0$
$e^{(-1-t\cdot\delta)\cdot\mu}$ is a convex function.

➁suppose $c$+$d\cdot\delta$ pass through $e^{(-1-t\cdot\delta)\cdot\mu}$ at $(-1,e^{(-1+t)\cdot\mu})$ and $(1,e^{(-1-t)\cdot\mu})$, then:
$c$=$\frac {e^{(-1-t)\cdot\mu}+e^{(-1+t)\cdot\mu}}{2}$
$d$=$\frac {e^{(-1-t)\cdot\mu}-e^{(-1+t)\cdot\mu}}{2}$

➂we’d like to deduct out the expression of $t$ for any $-1\leq\delta\leq 1$, that is to express $\delta$ in terms of $t$:
$D_{\delta}(c+d\cdot\delta)$=$0$
$\Rightarrow d$=$0$
$\Rightarrow e^{(-1-t)\cdot\mu}$=$e^{(-1+t)\cdot\mu}$
$\Rightarrow -1-t$=$-1+t$
$\Rightarrow t$=$0$, which is a contradiction, for $t>0$ is the must be condition in proof.

We have the generalized $t$=$0$ for both upper and lower bound in Chernoff bounds for Bernoulli random variable by using the convex property in the proof of Chernoff bounds for arbitrary random variable, and found it an inappropriate value, because $t$ should be any positive value, at least greater than zero.
[Notes::mjtsai1974]
As a result of above indication, we can claim that the inequality in Chernoff bounds for Bernoulli random variable:
➀$P(X>(1+\delta)\cdot\mu)$<$e^{((e^{t}-1)-t\cdot (1+\delta))\cdot\mu}$
➁$P(X<(1-\delta)\cdot\mu)$<$e^{(-1-t\cdot\delta)\cdot\mu}$
is inappropriate in Chernoff bounds for arbitrary random variable.

mjtsai1974's Dev Blog Welcome to mjt's AI world

Chernoff Bounds For Arbitrary Random Variable

Prologue To The Chernoff Bounds For Arbitrary Random Variable

The Error Probability After Test

The Upper Bound On Error Probability

The Lower Bound On Error Probability

Illustration: Chernoff bound For Bernoulli versus arbitrary Random Variables::mjtsai1974

Related Posts

Partial Observable Markov Decision Process - Part 4 04 Dec 2020

Partial Observable Markov Decision Process - Part 3 10 Oct 2020

Partial Observable Markov Decision Process - Part 2 13 Aug 2020