Hoeffding Inequality versus Chebyshev's Inequality

24 Oct 2017

Hoeffding Inequality

Suppose you are given an in−sample data set with certain property ν to train your learning algorithm with a hope to predict an out−sample data set with unknown and to be believed existed property μ. Does in−sample ν say anything about out−sample μ?? Where the in−sample and out−sample might consist of small balls in red, green, blue. Both samples are coming from the same random generator. We treat the distribution of colour in balls as the property.

[1]No!! Possibly not, in−sample might contains only red balls, while out−sample might contains only green balls.
[2]Yes!! Probably yes, maybe in−sample and out−sample might contains similar high proportion of blue balls, thus, in−sample ν might be closed to out−sample μ.

Hoeffding inequality guarantees that there exists such possibility that in−sample ν might be closed to out−sample μ, their difference is within quiet small ε, on conditions that the sample size is large:

P(|ν − μ| > ε) ≤ 2 × exp(−2 × ε² × N); where N is the in|out−sample size. Hoeffding inequality claims ν = μ is probably approximate correct.

[1]Valid for all N and ε.
[2]Doesn't depend on μ, no need to know μ.
[3]Large sample size N or looser gap ε, will we have higher probability for ν = μ.

Above paragraph is my summary from NTU H.T.Lin’s Machine Learning Foundation.

Chebyshev's Inequality

Chebyshev's inequality claims that P(|Y − E(Y)| > a) ≤ (1 ∕ a²) × Var(Y); where Y is any arbitrary random variable Y and any a > 0.

Proof:
Var(Y) = ∫^∞_−∞(y − μ)²×ƒ_Ydy
           ≥ ∫^∞_{|y − μ| ≥ a}(y − μ)²×ƒ_Ydy
           ≥ ∫^∞_{|y − μ| ≥ a}a²×ƒ_Ydy
           = a²×P(|Y − E(Y)| > a), where μ = E(Y).

Suppose we have
(1)random variables X₁, X₂,,,,X_n with expectation μ and variance δ²,
(2)_avgX_n = Σⁿ_i=1X_i ∕ n, Var(_avgX_n) = δ² ∕ n,
then, for any ε > 0, by using Chebyshev's inequality, we have
P(|_avgX_n − μ| > ε) = P(|_avgX_n − E(_avgX_n)| > ε)
≤ δ² ∕ (n × ε²), as n→∞, it→0.
∴ lim_n→∞P(|_avgX_n − μ| > ε) = 0 … law of large number.

Hoeffding Inequality ≈ Chebyshev's Inequality

Now, we compare Hoeffding Inequality with Chebyshev's Inequality

[1]P(|ν − μ| > ε) ≤ 2 × exp(−2 × ε² × N)
v.s.
[2]P(|_avgX_n − μ| > ε) ≤ δ² ∕ (n × ε²)

This is equivalent to compare 2 × exp(−2 × ε² × N) v.s. δ²∕(n × ε²)
For [1], as N → ∞, we will have 2 × exp(−2 × ε² × N) → 0
For [2], as n → ∞, we will have δ²∕(n × ε²) → 0
∴ we just have Hoeffding Inequality ≈ Chebyshev's Inequality, where 2 in [1] and δ² in [2] would just lead to probability in the same sign.

mjtsai1974's Dev Blog Welcome to mjt's AI world

Hoeffding Inequality versus Chebyshev's Inequality