mjtsai1974's Dev Blog Welcome to mjt's AI world

Introduction To The Bayes Theorem

Prologue To Introduction To The Bayes Theorem

The Bayes theorem has been developed and evolved over time to enhance the accuracy of conditional probability prediction result, especially, when you have only a few data gathered and would like to have a convincible, plausible result. It could be pervasively found in the machine learning, reinforcement learning, wherein the POMDP transition probability is one such model.

Law Of Total Probability

Consider in a random experiment, given below conditions:
➀$\Omega$=$\{B_{1},B_{2},…,B_{n}\}$, where $B_{i}\cap B_{j}$=$0$.
➁within the sample space, there exist another event $A$, partitioned by $B_{i}$ randomly.
We have $P(A)$=$P(A\vert B_{1})\cdot P(B_{1})$+…+$P(A\vert B_{n})\cdot P(B_{n})$

proof:
➀by intersection of $A$ and $\Omega$ could we get $A$:
$A$
=$A\cap \Omega$
=$A\cap (B_{1}\cup B_{2}\cup…\cup B_{n})$
=$(A\cap B_{1})\cup (A\cap B_{2})…\cup (A\cap B_{n})$.
➁the total probability of event $A$:
$P(A)$
=$P((A\cap B_{1})\cup (A\cap B_{2})…\cup (A\cap B_{n}))$
=$P(A\cap B_{1})$+$P(A\cap B_{2})$+…+$P(A\cap B_{n})$
=$P(A\vert B_{1})\cdot P(B_{1})$+…+$P(A\vert B_{n})\cdot P(B_{n})$

The Bayes Theorem

Given 2 distinct events $A$ and $B$, the term $P(A\cap B)$ can interconnect below 2 expression:
➀$P(A\cap B)$=$P(A\vert B)\cdot P(B)$
➁$P(B\cap A)$=$P(B\vert A)\cdot P(A)$

The sequence order in intersection changes nothing:

$P(B\vert A)\cdot P(A)$=$P(A\vert B)\cdot P(B)$, then,
$\;\;\;\;P(B\vert A)$=$\frac {P(A\vert B)\cdot P(B)}{P(A)}$…Bayes Theorem

The General Form of Bayes Theorem

By using the law of total probability, the general form of Bayes theorem describing the probability of event $B$, given event $A$ could be expressed in below terms:
$\;\;\;\;P(B_{i}\vert A)$=$\frac {P(A\vert B_{i})\cdot P(B_{i})}{P(A\vert B_{1})\cdot P(B_{1})+…+P(A\vert B_{n})\cdot P(B_{n})}$

Example: Illustration By Rainy And Sunny Days In One Week

[Question]

Suppose we have a fully record of the weather in the past one week, they are rainy and sunny periods. Say $P(Rainy)$=$\frac {3}{7}$, $P(Sunny)$=$\frac {4}{7}$.

In the rainy period of time, assume such probability when we look into the sky, we can see sun shinning(quiet strange earth weather), $P(Sunny\vert Rainy)$=$\frac {1}{5}$, that is $\frac {1}{5}$ of the time you can see the sun shining in the rainy days.

The we’d like to know how often could we have rain drops when we are under the sun shinning.

[Answer]

This is to ask for $P(Rainy\vert Sunny)$.
$P(Rainy\vert Sunny)$=$\frac {P(Sunny\vert Rainy)\cdot P(Rainy)}{P(Sunny)}$=$\frac {\frac {1}{5}\cdot\frac {3}{7}}{\frac {4}{7}}$=$\frac {3}{20}$

Example: Illustration By Dogs And cats

Suppose there exists 60 dos and 40 cats in the animal hospital, 20 female dogs and 10 female cats.

[1] When we pick up one female animal, what's the probability it is a dog?

This is asking for $P(Dog\vert Female)$. Total probability of female animals should be calculated out in advance.
➀probability of female animals:
$P(Female)$
=$P(Female \cap(Dog \cup Cat))$
=$P(Female \cap Dog)$+$P(Female \cap Cat)$
=$\frac {20}{100}$+$\frac {10}{100}$
=$\frac {3}{10}$
➁$P(Dog\vert Female)$
=$\frac {P(Female \cap Dog)}{P(Female)}$
=$\frac {P(Female \vert Dog)\cdot P(Dog)}{P(Female)}$
=$\frac {\frac {20}{60}\cdot\frac {60}{100}}{\frac {3}{10}}$
=$\frac {2}{3}$

[2] Back to $P(Dog\vert Female)$ again, for the term $\frac {P(Female \cap Dog)}{P(Female)}$, can we instead to express in the term $\frac {P(Dog \cap Female)}{P(Female)}$?

Yes! Let’s expand from it to verify.
➀$P(Dog\vert Female)$
=$\frac {P(Dog \cap Female)}{P(Female)}$
=$\frac {P(Dog \vert Female)\cdot P(Female)}{P(Female)}$
➁trivially, back to $\frac {P(Dog \cap Female)}{P(Female)}$ is not incorporating anything already given to help to figure out something we don't know!! Although, we can also count it by means from the given sample space of 30 females, 20 dogs in it, to get the answer of $\frac {2}{3}$.

[Conclusion]

The major purpose of Bayes theorem is to use known probability to figure out unknown probability in its maximum likelihood, instead of manual counting in the sample space.
The tiny skill would be to advice the given condition in the first order in the intersection term, the to be asked probability in the second term.