Chernoff Bounds For Arbitrary Random Variable
23 Dec 2019Prologue To The Chernoff Bounds For Arbitrary Random Variable
The Error Probability After Test
[The given condition]Suppose we are given
➀X1,X2,…,Xn to be independent random variables with values in [0,1].
➁X=X1+X2+…+Xn
➂E[X]=μThese X1,…,Xn needs not to be Bernoulli ranodm variables, but they must be independent.
[Question]Then, for every ε>0, what is the upper and lower bound for
➀P(X≥(1+δ)⋅μ)=P(X≥μ+ε)
➁P(X≤(1−δ)⋅μ)=P(X≤μ−ε)
, where ε=δ⋅μ
The Upper Bound On Error Probability
Still, it is asking the upper bound on error probability, when target value X is greater than the expect value μ, and we’d like to bound it on the upper side.
The major difference is that these random variables fall within [0,1], not the Bernoulli values 0 or 1. Here is the idea:
➀take Yi=Xi-E[Xi]
➁take X=∑n1Xi,E[X]=μ=∑n1E[Xi]
➂take Y=∑n1Yi
➃P(X≥μ+ε)
=P(X−μ≥ε)
=P(∑n1(Xi−E[Xi])≥ε)
=P(Y≥ε)
=P(et⋅Y≥et⋅ε)…for any t>0
≤E[et⋅Y]et⋅ε, and E[Yi]=0Further expand
E[et⋅Y]
=E[e∑n1t⋅Yi]
=E[∏n1et⋅Yi]
=∏n1E[et⋅Yi]Thus, P(X≥μ+ε)≤∏n1E[et⋅Yi]et⋅ε is left to bound the term E[et⋅Yi].
➀Dyet⋅Yi=t⋅et⋅Yi>0
➁Dyt⋅et⋅Yi=t2⋅et⋅Yi>0
et⋅Yi is a convex function, it concaves up, by the 2nd derivative greater than 0.Suppose there exists a line a+b⋅y passing through the curve of et⋅Yi at the points (1,et) and (−1,e−t), that is
a+b=et
a-b=e−t
⇒solve above equation, could we obtain a=et+e−t2, b=et−e−t2We can claim that within [−1,1]
et⋅Yi≤a+b⋅Yi
⇒E[et⋅Yi]≤E[a+b⋅Yi]
⇒E[et⋅Yi]≤a+b⋅E[Yi]
⇒E[et⋅Yi]≤a+b⋅0, where E[Yi]=0
⇒E[et⋅Yi]≤a
⇒E[et⋅Yi]≤et+e−t2By using Taylor expansion of et, we can simplify the bounding:
ex=1+x11!+x22!+x33!+…, then
et+e−t2
=12⋅[+1+t11!+t22!+t33!+t44!+… +1+(−t)11!+(−t)22!+(−t)33!+(−t)44!+…+]
=1+t22!+t44!+t66!+…
≤1+t22⋅1!+t422⋅2!+t623⋅3!+…
≤1+t221!+(t22)22!+(t22)33!+…
≤et22Therefore, et⋅Yi≤et22,
∏n1E[et⋅Yi]
=E[et⋅(Y1+Y2+…+Yn)]
=E[en⋅t⋅Yi]
≤E[et22]Back to the inequality:
P(X≥μ+ε)
≤∏n1E[et⋅Yi]et⋅ε
≤E[et22]et⋅ε
≤et22−t⋅εNext would be:
➀ask for the maximum value of t22−t⋅ε, take its 1st order differentiation, set it to 0, figure out such t:
Dt(t22−t⋅ε)=0
⇒n⋅t−ε=0
⇒t=εn➁take t=εn in et22−t⋅ε, then
en⋅ε2n22−εn⋅ε
=eε22n−ε2n
=e−ε22nFinally, we prove P(X≥μ+ε)≤e−ε22n in this upper bound for error probability.
The Lower Bound On Error Probability
It is asking the error probability when target value X is less than the expect value μ, and we’d like to bound it on the lower side.
P(X≤μ−ε)
=P(X−μ≤−ε)
=P(−X+μ≥ε)
=P(−X≥−μ+ε)It is much similar to the upper bound expression, here is the idea again:
➀take Y′i=E[Xi]-Xi
➁take X=∑n1Xi,E[X]=μ=∑n1E[Xi]
➂take Y′=∑n1Y′i
➃P(X≤μ−ε)
=P(−X≥−μ+ε)
=P(Y′≥ε)
=P(et⋅Y′≥et⋅ε)…for any t>0
≤E[et⋅Y′]et⋅εWith similar deduction in upper bound would we obtain the same expression for lower bound:
P(X≤μ−ε)≤e−ε22nBy symmetrical distribution, P(X−μ≤−ε) is the opposite direction to P(X−μ≥ε), its lower bound of error probability would be no more than et22−t⋅ε
P(X≤μ−ε)
=P(X−μ≤−ε)
Illustration: Chernoff bound For Bernoulli versus arbitrary Random Variables::mjtsai1974
[The major point] Why the proof in Chernoff bound for Bernoulli is inadequate in arbitrary random variables? Can we use the proof in Chernoff bound for Bernoulli in arbitrary random variable?If not, why?
The Bernoulli random variable has only two values, 0 and 1, while arbitrary random variable has values falling withing [0,1]. It seems that probabilistic error tolerance in arbitrary random variables requires much more precision concern in approximation, which would be shown in my next proof.
[1] The fault tolerance δ of upper and lower boundFrom the proof in Chernoff Bounds For Bernoulli Random Variable, we see that
➀P(X>(1+δ)⋅μ)<e((et−1)−t⋅(1+δ))⋅μ
➁P(X<(1−δ)⋅μ)<e(−1−t⋅δ)⋅μNext would be to inspect the fault tolerance δ in Chernoff bounds for Bernoulli random variable for any given t>0, based on the convex property used in the proof of Chernoff bounds for arbitrary random variable.
[1.1] The fault tolerance δ of upper bound➀Dte((et−1)−t⋅(1+δ))⋅μ
=(et−(1+δ))⋅μ⋅e((et−1)−t⋅(1+δ))⋅μ
➁D2te((et−1)−t⋅(1+δ))⋅μ
=(et⋅μ+(et−(1+δ))2)⋅e((et−1)−t⋅(1+δ))⋅μSince D2te((et−1)−t⋅(1+δ))⋅μ≥0 alswys holds, this upper bound of e((et−1)−t⋅(1+δ))⋅μ in Chernoff bounds for Bernoulli random variable is a convex function, which concaves up.
➂suppose c+d⋅t pass through e((et−1)−t⋅(1+δ))⋅μ at (0,1) and (1,ee−2−δ)…for t>0
c+d⋅0=1…t=0
c+d=ee−2−δ…t=1
⇒c=1 and d=ee−2−δ-1, thus it is
1+(ee−2−δ−1)⋅t➃we’d like to figure out fault tolerance of δ for any t>0, that is to express t in terms of δ:
Dt(1+(ee−2−δ−1)⋅t)=0
⇒ee−2−δ−1=0
⇒ee−2−δ=1
⇒e−2−δ=0
⇒δ=e−2…e=2.71828182…
⇒δ=0.71828182…For whatever 0<t≤1 we are using, the fault tolerance expressed by δ could be probabilistically up to almost 0.718, might not be an inappropriate choice.
[1.2] The fault tolerance δ of lower bound➀Dte(−1−t⋅δ)⋅μ=−δ⋅μ⋅e(−1−t⋅δ)⋅μ
➁D2te(−1−t⋅δ)⋅μ=(−δ⋅μ)2⋅e(−1−t⋅δ)⋅μSince (−δ⋅μ)2⋅e(−1−t⋅δ)⋅μ≥0 always holds, this lower bound of e(−1−t⋅δ)⋅μ in Chernoff bounds for Bernoulli random variable is a convex function, which concaves up.
➂suppose c+d⋅t pass through e(−1−t⋅δ)⋅μ at (0,e−μ) and (1,e(−1−δ)⋅μ), then
c+d⋅0=e−μ…t=0
c+d=e(−1−δ)⋅μ
⇒c=e−μ and d=e(−1−δ)⋅μ-e−μ, thus it is
e−μ+(e(−1−δ)⋅μ−e−μ)⋅t➃we’d like to figure out fault tolerance of δ for any t>0, that is to express t in terms of δ:
Dt(e−μ+(e(−1−δ)⋅μ−e−μ)⋅t)=0
⇒e(−1−δ)⋅μ−e−μ=0
⇒e(−1−δ)⋅μ=e−μ
⇒(−1−δ)⋅μ=−μ
⇒δ=0For whatever 0<t≤1 we are using, the fault tolerance expressed by δ could only be to 0, might be an inappropriate choice in error probability evaluation.
[2] The existence of t in upper and lower boundSucceeding to above section, this article would like to ask for t, such that we can have an ideal upper/lower bound, for δ in [−1,1]:
[2.1] The validity of t in upper bound
➀P(X>(1+δ)⋅μ)<e((et−1)−t⋅(1+δ))⋅μ
➁P(X<(1−δ)⋅μ)<e(−1−t⋅δ)⋅μ➀D2δe((et−1)−t⋅(1+δ))⋅μ≥0
e((et−1)−t⋅(1+δ))⋅μ is a convex function.➁suppose c+d⋅δ pass through e((et−1)−t⋅(1+δ))⋅μ at (−1,e(et−1)⋅μ) and (1,e(et−1−2⋅t)⋅μ), then:
c=e(et−1−2⋅t)⋅μ+e(et−1)⋅μ2
d=e(et−1−2⋅t)⋅μ−e(et−1)⋅μ2➂we’d like to deduct out the expression of t for any −1≤δ≤1, that is to express δ in terms of t:
[2.2] The validity of t in lower bound
Dδ(c+d⋅δ)=0
⇒d=0
⇒e(et−1−2⋅t)⋅μ−e(et−1)⋅μ2=0
⇒e(et−1−2⋅t)⋅μ=e(et−1)⋅μ
⇒(et−1−2⋅t)⋅μ=(et−1)⋅μ
⇒et−1−2⋅t=(et−1)
⇒t=0, which is a contradiction, for t>0 is the must be condition in proof.➀D2δe(−1−t⋅δ)⋅μ≥0
e(−1−t⋅δ)⋅μ is a convex function.➁suppose c+d⋅δ pass through e(−1−t⋅δ)⋅μ at (−1,e(−1+t)⋅μ) and (1,e(−1−t)⋅μ), then:
c=e(−1−t)⋅μ+e(−1+t)⋅μ2
d=e(−1−t)⋅μ−e(−1+t)⋅μ2➂we’d like to deduct out the expression of t for any −1≤δ≤1, that is to express δ in terms of t:
Dδ(c+d⋅δ)=0
⇒d=0
⇒e(−1−t)⋅μ=e(−1+t)⋅μ
⇒−1−t=−1+t
⇒t=0, which is a contradiction, for t>0 is the must be condition in proof.We have the generalized t=0 for both upper and lower bound in Chernoff bounds for Bernoulli random variable by using the convex property in the proof of Chernoff bounds for arbitrary random variable, and found it an inappropriate value, because t should be any positive value, at least greater than zero.
[Notes::mjtsai1974]As a result of above indication, we can claim that the inequality in Chernoff bounds for Bernoulli random variable:
is inappropriate in Chernoff bounds for arbitrary random variable.
➀P(X>(1+δ)⋅μ)<e((et−1)−t⋅(1+δ))⋅μ
➁P(X<(1−δ)⋅μ)<e(−1−t⋅δ)⋅μ