mjtsai1974's Dev Blog Welcome to mjt's AI world

The Central Limit Theorem

Prologue To The Central Limit Theorem

The central limit theorem is a refinement of the law of large number. For a large number of random variables $X_1$,$X_2$,...,$X_n$, with converged expect value and finite variance, the standardization process would settle down $\overline {X_n}$ in a normal distribution, irrelevant to the original distribution these $X_i$ is belonging to.

Standardizing The Average

Given a large number of random variables $X_i$ belonging to the same sample, with the same expect value $\mu$ and variance $\sigma^{2}$, the law of large number guarantees the average would approximate to $\mu$.

Here comes the question as what is the distribution of $\overline {X_n}$? Since each random variables $X_i$ has the same $\mu$ and $\sigma^{2}$, it would be a good idea to stablize the expect value and variance of $\overline {X_n}$.

We already know $E\lbrack \overline{X_n}\rbrack$=$\mu$ and $Var\lbrack \overline{X_n}\rbrack$=$\frac {\sigma^{2}}{n}$. What would be the acceptable expect value with regards to the stablized variance?
➀by $E\lbrack \overline{X_n}-\mu\rbrack$=$0$, we can zerolize the expect value, to be believed the smallest value.
➁next to make the variance stable, suppose there exists any $c>0$ such that $Var\lbrack c\cdot\overline{X_n}\rbrack$ could be well stablized. If we can factor out whatever the variance residing in the distribution of $\overline {X_n}$ itself, then there will be a hope. For unknown distribution, this is quiet difficult.
➂but, we would make it easy by taking $c=\frac {\sqrt n}{\sigma}$, the mathematic thing guarantees the purity and balance of the variance, since $Var\lbrack \frac {\sqrt n}{\sigma}\cdot\overline{X_n}\rbrack$=$\frac {n}{\sigma^{2}}\cdot Var\lbrack \overline{X_n}\rbrack$=$1$
➃$Var\lbrack \frac {\sqrt n}{\sigma}\cdot(\overline{X_n}-\mu)\rbrack$
=$Var\lbrack \frac {\sqrt n}{\sigma}\cdot\overline{X_n}\rbrack$, we can further stablize the variance in a centered average format.

Above procedure is the standardization or the standardize process.

The Central Limit Theorem

Given $X_1$,$X_2$,…,$X_n$ are identically independent distributed random variables, each has the same expect value $\mu$ and variance $\sigma^{2}$, which are all finite.
For any $n\ge 1$, let $Z_n$ be any random variable, defined by

$\;\;\;\;Z_n$=$\frac {\overline {X_n}-\mu}{\sigma/\sqrt {n}}$;

then, $E\lbrack Z_n\rbrack$=$0$ and $Var\lbrack Z_n\rbrack$=$1$.

$Z_n$ itself is the standard normal distribution, $N(0,1)$

, for any $a$, we have $F_{Z_n}(a)$=$ɸ(a)$.

We treat $Z_n$ as the standardized $\overline {X_n}$.

Example: Illustration Of The Central Limit Theorem

Suppose you are given a random sample of size $500$ containing random variables $X_1$,$X_2$,…,$X_{500}$, all of them coming from the same unknown distribution with each having expect value $2$ and variance also $2$.

After completing all the $500$ runs of test, we get the experiement average of $\overline {X_n}$=$2.06$, do you think it a plausible result?

To answer this question, we have to compute the probability of the case that $\overline {X_n}$ is greater than or equal to $2.06$.
$P(\overline {X_n}\ge 2.06)$
=$P(\overline {X_n}-\mu\ge 2.06-\mu)$
=$P(\frac {\overline {X_n}-\mu}{\sigma/\sqrt {n}}\ge \frac {2.06-\mu}{\sigma/\sqrt {n}})$
=$P(\frac {\overline {X_{500}}-\mu}{\sigma/\sqrt {500}}\ge \frac {2.06-2}{\sqrt {2}/\sqrt {500}})$…$\mu$=$2$,$\sigma$=$\sqrt {2}$
=$P(Z_{500}\ge 0.95)$
=1-$P(Z_{500}<0.95)$
=1-$ɸ(0.95)$
$\approx 0.1711$, it indicates that there exists probability of $0.1711$ that the average is $0.06$ larger than $2$, the expect value of the real thing.
Since $0.1711$ is quiet a large probability, it is rather weak to say that $2.06$ is an abnormal experimental result of average. $2.06$ would thus be plausible.