mjtsai1974's Dev Blog Welcome to mjt's AI world

Chernoff Bounds For Bernoulli Random Variable

Prologue To The Chernoff Bounds For Bernoulli Random Variable

The Chernoff bounds is a technique to build the exponential decreasing bounds on tail probabilities. This article develops the tail bound on the Bernoulli random variable with outcome 0 or 1.

The Error Probability After Test

Let Xi to be a Bernoulli random variable for i={1,2,,n}
Xi=1 with probability pi
Xi=0 with probability 1-pi
, where all Xis are I.I.D and suppose at least one of the pi is non-zero.

We are testing with a hope that we can make our target value meet the expectation level after some amount of trials. To express the target value we want in terms of expect value, we let X=ni=1Xi, E[X]=μ=ni=1pi.

[Question]

How confident we are in the final outcome by the random variable X? It is asking you 2 kinds of questions:
➀the range of the confidence interval
➁the error probability for the target value X above or below the expect value μ

The follwing section would guide you through the process to build the upper and lower bound of such error probability in Chernoff exponential decreasing form.

[Notes] Even if our proof of ➁ holds, the range of confidence interval must be guaranteed by the native design in your experiment.

The Upper Bound On Error Probability

It is asking the error probability when target value X is greater than the expect value μ, and we’d like to bound it on the upper side.

P(X>(1+δ)μ), for δ>0
=P(etX>et(1+δ)μ), for all t>0
E[etX]et(1+δ)μMarkov inequality

Since each Xi is independent, we have
E[etX]
=E[etni=1Xi]
=E[et(X1+X2++Xn)]
=E[ni=1etXi]
=ni=1E[etXi]
=ni=1et1pi+et0(1pi)
=ni=1etpi+1(1pi)
=ni=11+(et1)pi

Because 1+X<eX for all X>0, we have
E[etX]
=ni=11+(et1)pi
<ni=1e(et1)pi
=e(et1)(p1++pn)
=e(et1)μ

Therefore, we have
P(X>(1+δ)μ)
E[etX]et(1+δ)μ
<e(et1)μet(1+δ)μ
=e((et1)t(1+δ))μ…➀

Next would be to figure out the minimum of e((et1)t(1+δ))μ by taking its first derivative with respect to t, set it to zero, get such value of t to make it zero:
de((et1)t(1+δ))μdt
=[et(1+δ)]μe[(et1)(1+δ)t]μ

The right side of exponential part won’t be zero, then
[et(1+δ)]μ=0
et(1+δ)=0
t=ln(1+δ) could we have the minimum value

Take t=ln(1+δ) into ➀, we get:
P(X>(1+δ)μ)<[eδ(1+δ)(1+δ)]μ…[A]
As a result, (eln(1+δ))(1+δ)μ=(1+δ)(1+δ)μ

The Lower Bound On Error Probability

It is asking the error probability when target value X is less than the expect value μ, and we’d like to bound it on the lower side.

P(X<(1δ)μ), for δ>0
=P(X>(1δ)μ)
=P(et(X)>et((1δ))μ), for all t>0
E[et(X)]et((1δ))μ)Markov inequality

Via similar approach, we could have
P(X<(1δ)μ)<[eδ(1δ)(1δ)]μ…[B]

Bounding The Bounds

We now have 2 bounds in [A] and [B], such bounds [eδ(1+δ)(1+δ)]μ and [eδ(1δ)(1δ)]μ are difficult to use, we should turn it into human readable format.

The nominator is the exponential raised to the power of δ, take upper bound [eδ(1+δ)(1+δ)]μ for illustration, we’d like to turn (1+δ)(1+δ) into exponential powered base expression, to be believed the most efficient way.

[1] The upper bound

Begin from the the natural logarithm of 1+δ, that is ln(1+δ)=y, then ey=1+δ, we can just express (1+δ)(1+δ) as (ey)(1+δ), the very next thing would be to figure out ln(1+δ) in the expression of approximation.

Since dln(1+δ)dδ=11+δ, the ln(1+δ) begins from integration:
ln(1+δ)
=1011+δdδ
=10(1δ+δ2δ3+δ4)dδ
=δ-δ22!+δ33!-δ44!+…
=1(1)k1δkk!

This δ is quiet small, temporarily forget about it after the power of 3, then we have
(1+δ)(1+δ)>(eδδ22!+δ33!)(1+δ)
eδ+δ22δ36
eδ+δ22+ε

Therefore, the upper bound in [A] becomes:
P(X>(1+δ)μ)<[eδ(1+δ)(1+δ)]μ
P(X>(1+δ)μ)<[eδeδ+δ22+ε]μ
P(X>(1+δ)μ)<eδ22+εμ

To be more narrow down, the choice of eδ23μ won’t violect, for ε0:
P(X>(1+δ)μ)<eδ23μ

[2] The lower bound

ln(1δ)
=ln(1+(δ))
=1(1)k1(δ)kk!

Thus, we have it that
ln(1δ)=δδ22!δ33!δ44!
ln(1δ)>δδ22!
(1δ)>eδδ22!
(1δ)(1δ)>(eδδ22!)(1δ)
(1δ)(1δ)>eδ+δ22+δ32
(1δ)(1δ)>eδ+δ22, where we ignore the term δ32

Therefore, we have the lower bound
P(X<(1δ)μ)<[eδ(1δ)(1δ)]μ
P(X<(1δ)μ)<[eδeδ+δ22]μ
P(X<(1δ)μ)<eδ22μ