Unbiased Estimator And Random Sample
14 May 2018Prologue To Unbiased Estimator And Random Sample
Random Sample And Sample Statistic
➀a random sample is a collection of random variables, say $X_1$,$X_2$,…,$X_n$, they have the same probability distribution and are assumed to be mutually independent. Those random variables thus constitute a random sample of population.
➁sampling is the behavior of taking samples from a population, it must be representative to the population from which it is obtained.
➂whereas, datasets are usually modelled as the realization of random samples, $X_1$,…,$X_n$.
➃sample statistic is an object, which depends on the random samples, $X_1$,…,$X_n$. To be more formally, the sample average $\overline {X_n}$ is just one of the commonly referenced sample statistic.You can also take one distinct random random variable $X$ as a random sample; although we often expect multiple random variables in a set of random sample.
Estimate versus Estimator
➀the estimate is the pure quantity obtained by means of the estimator.
The value of $\overline {X_n}$ is the estimate, $\frac {X_1+X_2+...+X_n}{n}$ is just the estimator.
➀the estimator is an artificial designed random variable by taking parameters constituting to a model distribution.
Unbiased Estimator And Sampling Distribution
➀assume the random variable $T$ is an estimator based on random sample consisting of $X_1$,$X_2$,…,$X_n$ for the quantity of features of interest, the distribution of the estimator $T$ is the sampling distribution of $t$.
➁the random variable $T$ is an unbiased estimator of the feature, denoted it as $\theta$, if and only if $E\lbrack T\rbrack$=$\theta$, for any value of $\theta$.
Unbiased Estimator For Sample Expectation
This section focus on the quantity of interest, the expect value of random sample.
Irrelevant of the original probabilistic distributionof the random sample, $\overline {X_n}$=$\frac {X_1+X_2+...+X_n}{n}$ is an unbiased estimator for the sample expectation, given that the sample is consisting of $X_1$,…,$X_n$, with $\mu$ and $\sigma^2$ as its finite expectation and variance.
proof:
$E\lbrack \overline {X_n}\rbrack$
=$E\lbrack \frac {X_1+X_2+…+X_n}{n}\rbrack$
=$\sum_{i=1}^{n}\frac {E\lbrack X_i\rbrack}{n}$
=$\sum_{i=1}^{n}\frac {\mu}{n}$
=$\mu$This proof is rather trivial.
Unbiased Estimator For Sample Variance
This section focus on the quantity of interest, the variance of random sample.
Irrelevant of the original probabilistic distributionof the random sample, $S_n^{2}$=$\sum_{i=1}^{n}\frac {(X_i-\overline {X_n})^{2}}{n-1}$ is an unbiased estimator for the sample variance, given that the sample is consisting of $X_1$,…,$X_n$, with $\mu$ and $\sigma^2$ as its finite expectation and variance.
proof:
➀we begin by the most basic definition of variance.
$E\lbrack \sum_{i=1}^{n}(X_i-\overline {X_n})^{2}\rbrack$
=$E\lbrack \sum_{i=1}^{n}(X_i-\mu+\mu-\overline {X_n})^{2}\rbrack$
=$E\lbrack \sum_{i=1}^{n}((X_i-\mu)-(\overline {X_n}-\mu))^{2}\rbrack$
➁expand the summation term.
$\sum_{i=1}^{n}((X_i-\mu)-(\overline {X_n}-\mu))^{2}$
=$\sum_{i=1}^{n}(X_i-\mu)^{2}$-$2\cdot\sum_{i=1}^{n}(X_i-\mu)\cdot(\overline {X_n}-\mu)$+$\sum_{i=1}^{n}(\overline {X_n}-\mu)^{2}$
; where $\sum_{i=1}^{n}(\overline {X_n}-\mu)^{2}$=$n\cdot(\overline {X_n}-\mu)^{2}$, and
$\sum_{i=1}^{n}(X_i-\mu)\cdot(\overline {X_n}-\mu)$
=$(\overline {X_n}-\mu)\cdot\sum_{i=1}^{n}(X_i-\mu)$
=$(\overline {X_n}-\mu)\cdot(\sum_{i=1}^{n}X_i-n\cdot\mu)$
=$(\overline {X_n}-\mu)\cdot(n\cdot\overline {X_n}-n\cdot\mu)$
=$n\cdot (\overline {X_n}-\mu)^{2}$
➂therefore, original expression becomes:
$E\lbrack \sum_{i=1}^{n}(X_i-\overline {X_n})^{2}\rbrack$
=$E\lbrack \sum_{i=1}^{n}(X_i-\mu)^{2}$-$2\cdot\sum_{i=1}^{n}(X_i-\mu)\cdot(\overline {X_n}-\mu)$+$\sum_{i=1}^{n}(\overline {X_n}-\mu)^{2}\rbrack$
=$E\lbrack \sum_{i=1}^{n}(X_i-\mu)^{2}$-$2\cdot n\cdot (\overline {X_n}-\mu)^{2}$+$n\cdot(\overline {X_n}-\mu)^{2}\rbrack$
=$E\lbrack \sum_{i=1}^{n}(X_i-\mu)^{2}$-$n\cdot (\overline {X_n}-\mu)^{2}\rbrack$
=$E\lbrack \sum_{i=1}^{n}(X_i-\mu)^{2}\rbrack$-$n\cdot E\lbrack (\overline {X_n}-\mu)^{2}\rbrack$
=$E\lbrack \sum_{i=1}^{n}(X_i-\mu)^{2}\rbrack$-$n\cdot E\lbrack (\overline {X_n}-E\lbrack \overline {X_n}\rbrack)^{2}\rbrack$
=$E\lbrack \sum_{i=1}^{n}(X_i-\mu)^{2}\rbrack$-$n\cdot Var\lbrack \overline {X_n}\rbrack$
=$E\lbrack \sum_{i=1}^{n}(X_i-\mu)^{2}\rbrack$-$n\cdot \frac {\sigma^{2}}{n}$
=$E\lbrack \sum_{i=1}^{n}(X_i-\mu)^{2}\rbrack$-$\sigma^{2}$
=$\sum_{i=1}^{n}E\lbrack (X_i-\mu)^{2}\rbrack$-$\sigma^{2}$
=$\sum_{i=1}^{n}E\lbrack (X_i-E\lbrack X_i\rbrack)^{2}\rbrack$-$\sigma^{2}$
; where $E\lbrack X_i\rbrack$=$\mu$,$E\lbrack (X_i-E\lbrack X_i\rbrack)^{2}$=$\sigma^2$
=$\sum_{i=1}^{n}\sigma^{2}$-$\sigma^{2}$
=$(n-1)\cdot\sigma^{2}$
➃from $E\lbrack \sum_{i=1}^{n}(X_i-\overline {X_n})^{2}\rbrack$=$(n-1)\cdot\sigma^{2}$,
we can have $E\lbrack \sum_{i=1}^{n}\frac {(X_i-\overline {X_n})^{2}}{n-1}\rbrack$=$\sigma^{2}$
Example: Unbiased Estimator Doesn't Always Hold
This article has shown the unbiased estimator of sample expectation and variance. Cautions must be made that the unbiased estimator doesn't always hold.
[1] Suppose that we have $g(X)$=$X^{2}$, which is strictly a convex function.
The mathematics thing can reveal it explicitly by means of the Jensen’s inequality, which claims that $g(E\lbrack X\rbrack)$<$E\lbrack g(X)\rbrack$, where $g(X)$ is a convex function.➀ take $X$=$S_n$, then
[2] Suppose that we have $g(X)$=$e^{-X}$ for the zero arrival probability, $p_0$=$e^{-\mu}$, in Pois($\mu$) distribution, say $\mu$=$E\lbrack\overline {X_n}\rbrack$, can we take $e^{-\overline {X_n}}$ for the unbiased estimator of zero arrival probability?
$g(E\lbrack X\rbrack)$=$g(E\lbrack S_n\rbrack)$=$E^{2}\lbrack S_n\rbrack$<$E\lbrack g(S_n)\rbrack$=$E\lbrack S_n^{2}\rbrack$=$\sigma^{2}$
➁ it implies that $E\lbrack S_n\rbrack$<$\sigma$, the unbiased estimator doesn't always hold, even if we are given that $S_n^2$ is an unbiased estimator of $\sigma^{2}$. In this case, $E\lbrack S_n\rbrack$ is not an unbiased estimator for $\sigma$.➀begin from Jensen’s inequality,
$E\lbrack e^{-\overline {X_n}}\rbrack$=$E\lbrack g(\overline {X_n})\rbrack$>$g(E\lbrack \overline {X_n}\rbrack)$=$e^{-E\lbrack\overline {X_n}\rbrack}$=$e^{\mu}$
➁hence, $e^{-\overline {X_n}}$ is not an unbiased estimator for the zero arrival probability.Suggestion is to be made to use $\overline {X_n}$ to be the unbiased estimator of $\mu$, as $n\rightarrow\infty$, the law of large numbers would guarantee $E\lbrack\overline {X_n}\rbrack$=$\mu$, finally, by using $e^{E\lbrack\overline {X_n}\rbrack}$ to approximate the zero arrival probability.