mjtsai1974's Dev Blog Welcome to mjt's AI world

Hoeffding Inequality versus Chebyshev's Inequality

Hoeffding Inequality

Suppose you are given an in−sample data set with certain property ν to train your learning algorithm with a hope to predict an out−sample data set with unknown and to be believed existed property μ. Does in−sample ν say anything about out−sample μ?? Where the in−sample and out−sample might consist of small balls in red, green, blue. Both samples are coming from the same random generator. We treat the distribution of colour in balls as the property.

[1]No!! Possibly not, in−sample might contains only red balls, while out−sample might contains only green balls.
[2]Yes!! Probably yes, maybe in−sample and out−sample might contains similar high proportion of blue balls, thus, in−sample ν might be closed to out−sample μ.

Hoeffding inequality guarantees that there exists such possibility that in−sample ν might be closed to out−sample μ, their difference is within quiet small ε, on conditions that the sample size is large:

P(|ν − μ| > ε) ≤ 2 × exp(−2 × ε² × N); where N is the in|out−sample size. Hoeffding inequality claims ν = μ is probably approximate correct.

[1]Valid for all N and ε.
[2]Doesn't depend on μ, no need to know μ.
[3]Large sample size N or looser gap ε, will we have higher probability for ν = μ.

Above paragraph is my summary from NTU H.T.Lin’s Machine Learning Foundation.

Chebyshev's Inequality

Chebyshev's inequality claims that P(|Y − E(Y)| > a) ≤ (1 ∕ a2) × Var(Y); where Y is any arbitrary random variable Y and any a > 0.

Proof:
Var(Y) = ∫−∞(y − μ)2׃Ydy
           ≥ ∫|y − μ| ≥ a(y − μ)2׃Ydy
           ≥ ∫|y − μ| ≥ aa2׃Ydy
           = a2×P(|Y − E(Y)| > a), where μ = E(Y).

Suppose we have
(1)random variables X1, X2,,,,Xn with expectation μ and variance δ2,
(2)avgXn = Σni=1Xi ∕ n, Var(avgXn) = δ2 ∕ n,
then, for any ε > 0, by using Chebyshev's inequality, we have
P(|avgXn − μ| > ε) = P(|avgXn − E(avgXn)| > ε)
            ≤ δ2 ∕ (n × ε2), as n→∞, it→0.
∴ limn→∞P(|avgXn − μ| > ε) = 0 … law of large number.

Hoeffding Inequality ≈ Chebyshev's Inequality

Now, we compare Hoeffding Inequality with Chebyshev's Inequality

[1]P(|ν − μ| > ε) ≤ 2 × exp(−2 × ε² × N)
           v.s.
[2]P(|avgXn − μ| > ε) ≤ δ2 ∕ (n × ε2)

This is equivalent to compare 2 × exp(−2 × ε² × N) v.s. δ2∕(n × ε2)
For [1], as N → ∞, we will have 2 × exp(−2 × ε² × N) → 0
For [2], as n → ∞, we will have δ2∕(n × ε2) → 0
∴ we just have Hoeffding Inequality ≈ Chebyshev's Inequality, where 2 in [1] and δ2 in [2] would just lead to probability in the same sign.