Hoeffding Inequality versus Chebyshev's Inequality
24 Oct 2017Hoeffding Inequality
Suppose you are given an in−sample data set with certain property ν to train your learning algorithm with a hope to predict an out−sample data set with unknown and to be believed existed property μ. Does in−sample ν say anything about out−sample μ?? Where the in−sample and out−sample might consist of small balls in red, green, blue. Both samples are coming from the same random generator. We treat the distribution of colour in balls as the property.
[1]No!! Possibly not, in−sample might contains only red balls, while out−sample might contains only green balls.
[2]Yes!! Probably yes, maybe in−sample and out−sample might contains similar high proportion of blue balls, thus, in−sample ν might be closed to out−sample μ.
Hoeffding inequality guarantees that there exists such possibility that in−sample ν might be closed to out−sample μ, their difference is within quiet small ε, on conditions that the sample size is large:
[1]Valid for all N and ε.
[2]Doesn't depend on μ, no need to know μ.
[3]Large sample size N or looser gap ε, will we have higher probability for ν = μ.
Above paragraph is my summary from NTU H.T.Lin’s Machine Learning Foundation.
Chebyshev's Inequality
Proof:
Var(Y) = ∫∞−∞(y − μ)2׃Ydy
≥ ∫∞|y − μ| ≥ a(y − μ)2׃Ydy
≥ ∫∞|y − μ| ≥ aa2׃Ydy
= a2×P(|Y − E(Y)| > a), where μ = E(Y).
Suppose we have
(1)random variables X1, X2,,,,Xn with expectation μ and variance δ2,
(2)avgXn = Σni=1Xi ∕ n, Var(avgXn) = δ2 ∕ n,
then, for any ε > 0, by using Chebyshev's inequality, we have
P(|avgXn − μ| > ε) = P(|avgXn − E(avgXn)| > ε)
≤ δ2 ∕ (n × ε2), as n→∞, it→0.
∴ limn→∞P(|avgXn − μ| > ε) = 0 … law of large number.
Hoeffding Inequality ≈ Chebyshev's Inequality
Now, we compare Hoeffding Inequality with Chebyshev's Inequality
[1]P(|ν − μ| > ε) ≤ 2 × exp(−2 × ε² × N)
v.s.
[2]P(|avgXn − μ| > ε) ≤ δ2 ∕ (n × ε2)
This is equivalent to compare 2 × exp(−2 × ε² × N) v.s. δ2∕(n × ε2)
For [1], as N → ∞, we will have 2 × exp(−2 × ε² × N) → 0
For [2], as n → ∞, we will have δ2∕(n × ε2) → 0
∴ we just have Hoeffding Inequality ≈ Chebyshev's Inequality, where 2 in [1] and δ2 in [2] would just lead to probability in the same sign.