mjtsai1974's Dev Blog Welcome to mjt's AI world

The Bayes Theorem Significance

Prologue To The Bayes Theorem Significance

The most optimized effect of Bayes theorem is to use already known probability to figure out the maximum unknown probability in its likelihood, instead of manual counting in the sample space. The Bayes theorem is the weapon for quantitative critical thinking to overcome the drawback of qualitative thinking of human nature.

The Bayes Theorem Significance

$\;\;\;\;P(B_{i}\vert A)$=$\frac {P(A\vert B_{i})\cdot P(B_{i})}{P(A\vert B_{1})\cdot P(B_{1})+…+P(A\vert B_{n})\cdot P(B_{n})}$

[1] The general form of Bayes theorem embedes a total probability expression in its denominator part.

➀such total probability is the linear combination of the events in the sample space with the probability $P(A\vert B_{i})$, the occurrence of the target condition $A$, under the given partitioned sample space $B_{i}$ as it’s weighting.
➁usually, $A$ is the qualitative feature of interest.
➂$\Omega$=$\{B_{1},B_{2},…,B_{n}\}$, where $B_{i}$ is the partition os the sample space.

[2] The general form of Bayes theorem has joint probability expression in its nominator part.

➀the joint probability is expressed in terms of $P(A\vert B_{i})\cdot P(B_{i})$.
➁$P(A\vert B_{i})\cdot P(B_{i})$=$P(A\cap B_{i})$, is to calculate the possible likelihood of the probability of the coexistence of $A$ and $B_{i}$.

[3] Take the Bayes theorem into parts

There exists 4 factors in the Bayes theorem expression.
➀$B_{i}$ is the prior probability, the already known probability.
➁$P(A\vert B_{i})$ is the likelihood function, by intuition the qualitative term, its major effect is the estimation of the possible likely probability of the occurrence of target event of interest $A$, under the given condition/partition $B_{i}$ of sample space.
➂the total probability also called the marginal probability, it is the summation over the distribution of each distinct $B_{i}$ with regards to the specific $A$, that means there could be infinite numbers of $A$.
➃$P(B_{i}\vert A)$ is the posterior probability, by intuition the quantitative target, now the sample space is the target event of interest $A$.

Example: 3 Red And 2 White Balls

[Question]

Given that there are 3 red and 2 white balls in a bow. Suppose you pick up 2 balls sequentially. What’s the probability of picking up the 2nd white ball and the 1st one is the red ball?

[Answer]

This is asking for $P(W_{2}\cap R_{1})$. Denote $R_{i}$,$W_{i}$ as the i-th picking the red, white ball up.
➀when we pick up the very first ball, $\Omega$=$\{r_{1},r_{2},r_{3},w_{1},w_{2}\}$.
$P(R_{1})$=$\frac {3}{5}$, the probability that the 1st ball is a red ball.
➁when we pick up the second ball, $\Omega$=$\{r_{2},r_{3},w_{1},w_{2}\}$.
$P(W_{2}\vert R_{1})$=$\frac {2}{4}$, the probability that the 2nd ball is a white ball, given that the first one is the red ball.
➂$P(W_{2}\cap R_{1})$=$P(W_{2}\vert R_{1})\cdot P(R_{1})$=$\frac {6}{20}$.

[More details]

➀$P(R_{1})$ is the prior priority.
➁$P(W_{2}\vert R_{1})$ is the likelihood function to estimate the probability that the 2nd ball is a white ball, given that the first one is the red ball.
➂$P(W_{2}\vert R_{1})\cdot P(R_{1})$ is the joint probability of the coexistence of $W_{2}$ and $R_{1}$, since we are asking for the probability of picking up the 2nd white ball and the 1st one is the red ball.

Example: quantitative versus qualitative::mjtsai1974

[Question]

Suppose the statistical population distribution in the Hsin-Chu science park area reports that $\%8$ of the population is the managers, $\%32$ is the marketing sales, $\%38$ is the manufacturing engineers, $\%22$ is the IC design engineers.
Given that Albert is a man, when you see him, his behavior is a little shy, talktive with strangers for road seeking, and is not talktive in political topics.

According to recent statistical research that given a manager, $0.1$ probability being shy; given a marketing sales, $0.24$ probability being shy; given a manufacturing engineer, $0.47$ probability being shy; given an IC design engineer, %0.29% probability being shy.

Under such condition, given that Albert is a shy man, what kind of identity has the maximum probability?

[Answer]

➀take $P(Mgr)$=$0.08$, $P(Sales)$=$0.32$, $P(M_{eng})$=$0.38$, $P(I_{eng})$=$0.22$ for probability of being managers, marketing sales, manufacture engineers, IC design engineers, which are all the prior probability.
➁take $P(Shy\vert Mgr)$=$0.1$, $P(Shy\vert Sales)$=$0.24$, $P(Shy\vert M_{eng})$=$0.47$, $P(Shy\vert I_{eng})$=$0.29$ for given a manager, a marketing sales, a manufacturing engineer, IC design engineer, the probability being shy respectively, which are all the likelihood function, the qualitative term.
➂the question asks for the maximum in between $P(Mgr\vert Shy)$, $P(Sales\vert Shy)$, $P(M_{eng}\vert Shy)$, $P(I_{eng}\vert Shy)$. Trivially, we need the total probability of being shy.
$P(Shy)$
=$P(Shy\vert Mgr)\cdot P(Mgr)$+
$\;\;\;\;P(Shy\vert Sales)\cdot P(Sales)$+
$\;\;\;\;P(Shy\vert M_{eng})\cdot P(M_{eng})$+
$\;\;\;\;P(Shy\vert I_{eng})\cdot P(I_{eng})$
=$0.1\cdot 0.08$+$0.24\cdot 0.32$+$0.47\cdot 0.38$+$0.29\cdot 0.22$
=$0.3272$
➃finally, the posterior probability:
$P(Mgr\vert Shy)$=$\frac {P(Shy\vert Mgr)\cdot P(Mgr)}{P(Shy)}$=$\frac {0.1\cdot 0.08}{0.3272}$=$0.0244$
$P(Sales\vert Shy)$=$\frac {P(Shy\vert Sales)\cdot P(Sales)}{P(Shy)}$=$\frac {0.24\cdot 0.32}{0.3272}$=$0.2347$
$P(M_{eng}\vert Shy)$=$\frac {P(Shy\vert M_{eng})\cdot P(M_{eng})}{P(Shy)}$=$\frac {0.47\cdot 0.38}{0.3272}$=$0.5458$
$P(I_{eng}\vert Shy)$=$\frac {P(Shy\vert I_{eng})\cdot P(I_{eng})}{P(Shy)}$=$\frac {0.29\cdot 0.22}{0.3272}$=$0.1949$

By means of quantitative posterior, we found that $P(M_{eng}\vert Shy)$ has the largest posterior probability and implies the maximum possibility that Albert is a shy man, he is a manufacture engineer.

Event Independence versus Conditional Probability

Prologue To Event Independence versus Conditional Probability

In the disciplines of probability, event independence is quiet an important concept. The inference from conditional probability come out with the result that one event occurrence is said to not related to another event, if the two events are independent.

Definition: Event Independence

An event $A$ is said independent of event $B$, if
$\;\;\;\;P(A\vert B)$=$P(A)$.

Event Independence Equivalence

By the definition of event independence, we can have an equivalence of expression from conditional probability:
$P(A\vert B)$=$\frac {P(A\cap B)}{P(B)}$=$P(A)$
$\Leftrightarrow P(A\cap B)=P(A)\cdot P(B)$

Below lists the basic properties:
➀$P(A\cap B)$=$P(A)\cdot P(B)$ indicates event $A$ is independent of event $B$.
➁by its symmetry, $P(A)\cdot P(B)$=$P(B)\cdot P(A)$=$P(B\cap A)$, event $B$ is independent of event $A$.
➂$P(A\vert B)$=$P(A)$ and $P(B\vert A)$=$P(B)$.

Event Independence Extension

[1] multiple events independence

$P(N_{1}\cap N_{2}\cap…\cap N_{m})$
=$P(N_{1})\cdot P(N_{2})\cdot …\cdot P(N_{m-1})\cdot P(N_{m})$, given that all events $N_{i}$ are all independent.

proof::mjtsai1974

➀$P(N_{m-1}\vert N_{m})$=$\frac {P(N_{m-1}\cap N_{m})}{P(N_{m})}$=$P(N_{m-1})$
Then $P(N_{m-1}\cap N_{m})$
=$P(N_{m-1})\cdot P(N_{m})$.
➁$P(N_{m-2}\vert (N_{m-1}\cap N_{m}))$=$\frac {P(N_{m-2}\cap (N_{m-1}\cap N_{m}))}{P(N_{m-1}\cap N_{m})}$=$P(N_{m-2})$
Then $P(N_{m-2}\cap (N_{m-1}\cap N_{m}))$
=$P(N_{m-2})\cdot P(N_{m-1})\cdot P(N_{m})$.
➂by mathematics induction, could we finally have the equivalence of expression.

[2] independence equivalence relation

Event $A$ is independent of event $B$
$\Leftrightarrow$ event $A^{c}$ is independent of event $B$

proof::mjtsai1974

$1$-$P(A\vert B)$
=$1$-$P(A)$
=$P(A^{c})$
=$P(A^{c}\vert B)$
From the end to the beginning could we prove the inverse direction.

[3] independence of all events

Given event $A$ is independent of event $B$, we can infer out all possible independence in between $A$, $A^{c}$, $B$, $B^{c}$.

proof::mjtsai1974

$P(A\vert B)$=$P(A)$
$\Leftrightarrow P(A^{c}\vert B)$=$P(A^{c})$
$\Leftrightarrow P(B\vert A^{c})$=$P(B)$
$\Leftrightarrow P(B^{c}\vert A^{c})$=$P(B^{c})$
$\Leftrightarrow P(B\vert A)$=$P(B)$
$\Leftrightarrow P(B^{c}\vert A)$=$P(B^{c})$
$\Leftrightarrow P(A\vert B^{c})$=$P(A)$
$\Leftrightarrow P(A^{c}\vert B^{c})$=$P(A^{c})$

We conclude if $A$ is independent of $B$, then $A^{c}$ is independent of $B$, $A$ is independent of $B^{c}$, $A^{c}$ is independent of $B^{c}$.

[4] event and its complemenmt

The prpbability of intersection of any given event and its complement is $0$, that is $P(A\cap A^{c})$=$0$.

Example: 2nd Head following 1st Head

Suppose you are tossing a fair coin, the probability of head and tail are all $\frac {1}{2}$, and each tossing is an independent case.

We’d like to ask for the probability that the 2nd tossing out a head, right after the 1st tossing out a head, then,
➀take the event of 1st tossing as $A_{1}$=$\{H,T\}$, take the event of 2nd tossing as $A_{2}$=$\{H,T\}$.
➁the sample space of these 2 tossing would be $\Omega$=$\{HH,HT,TH,TT\}$. The probability of 2 contiguous heads is $\frac {1}{4}$.
the probability that the 2nd tossing out a head, right after the 1st tossing out a head is asking for $P(A_{2}\cap A_{1})$, then
$\frac {1}{4}$
=$P(A_{2}\cap A_{1})$
=$P(A_{2}\vert A_{1})\cdot P(A_{1})$
=$\frac {1}{2}\cdot \frac {1}{2}$
=$P(A_{2})\cdot P(A_{1})$
Thus, we have $P(A_{2}\vert A_{1})$=$P(A_{2})$, it is fully compliant with the given that each tossing is an independent case.

Example: Illustration By Tossing A Fair Die

Suppose you are tossing a fair die, the sample space $\Omega$=$\{1,2,3,4,5,6\}$. We denote the event of numbers smaller than $4$ as $A$=$\{1,2,3\}$, and denote the event of even numbers as $B$=$\{2,4,6\}$.

To evaluate if event $A$ is independent of event $B$:
➀$P(A\cap B)$=$P(\{2\})$=$\frac {1}{6}$
➁$P(A)\cdot P(B)$=$\frac {1}{2}\cdot \frac {1}{2}$=$\frac {1}{4}$
Hence, the event of numbers smaller than $4$ is not independent of the event of even numbers.

Introduction To The Probability

Prologue To Introduction To The Probability

The probability describes how likely it is the test takes place with the expected result. It is the fundamental for modern world science.

Begin From The Fundamental

[1] sample space

sample space is just the set of elements describing the outcomes of the test, experiment, formally, the result after execution of certain action.
In statistics reference text book, the letter $\Omega$ is most often used to represent the sample space.
➁by flipping a coin one time, you will have two outcomes of head and tail, that is to say we associate the sample space with the set $\Omega$=$\{H,T\}$.
➂to guess the birthday within one week, the sample space $\Omega$=$\{Sun,Mon,Tue,Wed,Thu,Fri,Sat\}$.

[2] event

➀subset of a sample space is treated as event.
➁in the birthday in one week example, suppose we’d like to ask for the days with uppercase “S” as the prefix, then, we can denote $S$=$\{Sun,Sat\}$.
➂suppose we’d like to ask for the days with uppercase “T” as the prefix, then, we can denote $T$=$\{Tue,Thu\}$.

[3] intersection, union, complement

Suppose $A$ and $B$ are two events in the sample space $\Omega$.
intersection, it’s an event operator, denoted by $\cap$.
union, also an event operator, denoted by $\cup$.
complement, an event operator, usually denoted by lowercase $c$.

[4] disjoint events

Suppose $A$ and $B$ are two events in the sample space $\Omega$. They are said to be two disjoint events if they have no intersection. $A\cap B$=$0$.
Such events might be regarded as mutually exclusive.

The Probability

[1] why do we need the probability?

In order to express how likely it is that an event will occur, during the experiment, by assigning probability to each distinct event would be common. To distribute the probability accurately would not be an easy task.

[2] the probability function

Since each event would be associated with a probability, then we are in need of the probability function.
➀the uppercase “P” is the probability function on a sample space $\Omega$ to assign the event $A$ in $\Omega$ a number $P(A)$ in $[0,1]$. The number $P(A)$ is the probability of the occurrence of event $A$.
➁wherein $P(\Omega)$=$1$
$P(A\cup B)$=$P(A)$+$P(B)$-$P(A\cap B)$, where $P(A\cap B)$=$0$ for $A$ and $B$ are disjoint. If $A$,$B$,$C$ are disjoint events, then $P(A\cup B\cup C)$=$P(A)$+$P(B)$+$P(C)$.

[3] the probability is defined on events, not on outcomes

➀tossing a coin one time would we have $\Omega$=$\{H,T\}$, then $P(\{H\})$=$\frac {1}{2}$, $P(\{T\})$=$\frac {1}{2}$, under the assumption that head and tail chances are coming to an equilibrium.
➁given cards of read, blue, green colours. The permutation of all the possible orders of cards would be $\Omega$=$\{RGB$,$RBG$,$GRB$,$GBR$,$BRG$,$BGR\}$.
$P(\{RGB\})$=$P(\{RBG\})$=$P(\{GRB\})$=$P(\{GBR\})$=$P(\{BRG\})$=$P(\{BGR\})$=$\frac {1}{6}$…the same probability for each distinct event.
➂the same example as above, the probability of the event that green card is in the middle would be $P(\{RGB,BGR\})$=$\frac {1}{3}$.
The $\{RGB,BGR\}$ is such event we desire, wherein the $\{RGB\}$ and $\{BGR\}$ are the outcomes described by $\Omega$.

[4] additivity of probability

➀using the same card example, the probability of the event that green card is in the middle could be $P(\{XGX\})$=$P(\{RGB\})$+$P(\{BGR\})$=$\frac {1}{3}$.
This implies that the probability of an event could be obtained by summing over the probabilities of the outcomes belonging to the same event.
➁given $A$ is an event, then $P(A)$+$P(A^{c})$=$P(\Omega)$=$1$.
➂if $A$, $B$ are not disjoint, then $A$=$(A\cap B)\cup(A\cap B^{c})$, this is a disjoint union.
Therefore, $P(A)$=$P(A\cap B)$+$P(A\cap B^{c})$.

Product Of Sample Space

[1] run the same test over multiple times

To justify the experiment result, one single test would be executed over multiple times.
➀suppose we flip the same coin over 2 times, the sample space $\Omega$=$\{H,T\}$x$\{H,T\}$.
It is now $\Omega$=$\{HH,HT,TH,TT\}$. Total 4 outcomes in it, we can take one outcome as one event, then $P(\{HH\})$=$P(\{HT\})$=$P(\{TH\})$=$P(\{TT\})$=$\frac {1}{4}$, under the assumption that $P(\{H\})$=$P(\{T\})$ in each single tossing of coin.

[2] combine the sample space from different tests

➀given that 2 sample spaces with respect to 2 different tests’ outcomes, they are $\Omega_{1}$,$\Omega_{2}$, where sizeof($\Omega_{1}$)=$r$, sizeof($\Omega_{2}$)=$s$.
➁then $\Omega$=$\Omega_{1}$x$\Omega_{2}$, sizeof($\Omega$)=$r\cdot s$. If we treat each distinct combination in the sample space as one single event, the probability of such distinct event is $\frac {1}{r\cdot s}$. The $\frac {1}{r}$,$\frac {1}{s}$ are the probability for the occurrences of outcomes in the $\Omega_{1}$ and $\Omega_{2}$ with respect to test 1 and test 2.

[3] general form of the same test over multiple times

➀suppose we’d like to make the experiment for n runs. We take $\Omega_{i}$ to be the sample space of the i-th test result, $\omega_{i}$ to be one of the outcomes in $\Omega_{i}$.
➁if the occurrence of each outcome $\omega_{i}$ has probability $p_{i}$, then $P(\{\omega_{1},\omega_{2}…\omega_{n}\})$=$p_{1}\cdot p_{2}…p_{n}$, which is the probability for the event $\{\omega_{1},\omega_{2}…\omega_{n}\}$ to take place.
➂assume we flip a coin with probability $p$ of head, that implies $1-p$ of tail. Then the probability of 1 single head after 4 times of tossing would be $4\cdot (1-p)^3\cdot p$.
The sample space would be
$\Omega$=$\{(HTTT),(THTT),(TTHT),(TTTH)\}$. There are 4 combinations, with each has probability $(1-p)^{3}\cdot p$.

An Infinite Sample Space

[1] run the same test until succeeds

➀suppose we’d like to toss a coin until it appears with head. If the tail is always the result, the sample space $\Omega$=$\{T_{1},T_{2},T_{3},…,T_{n}…\}$, $n\rightarrow\infty$.
Next to ask the probability function of this sample space. Assume the probability of head is $p$, the tail is $1-p$.

[2] run the same test until succeeds

➀for the simplicity, we’d like to change the notation by $\Omega$=$\{1,2,..,n,…\}$ for the number of iterations the tossing coin result coming out with a head.
➁$P(1)$=$P(\{H\})$=$p$
➂$P(2)$=$P(\{TH\})$=$(1-p)\cdot p$
➃$P(n)$=$P(\{T_{1}T_{2}…T_{n-1}H_{n}\})$=$(1-p)^{n-1}\cdot p$
➄when $a$ is incredibly large, the total probability becomes
$\lim_{n\rightarrow\infty}P(1)+P(2)+…+P(n)$
=$\lim_{n\rightarrow\infty}p+(1-p)\cdot p+…+(1-p)^{n-1}\cdot p$
=$\lim_{n\rightarrow\infty}p\cdot\frac {1}{1-(1-p)}$
=$p\cdot\frac {1}{p}$
=$1$…the total probability

In an infinite sample space, if all the events $A_{1}$,$A_{2}$,…,$A_{n}$ are disjoint, then,
$P(\Omega)$
=$P(A_{1}\cup A_{2}\cup…\cup A_{n})$
=$P(A_{1})$+$P(A_{2})$+…$P(A_{n})$
=$1$

Introduction To The Conditional Probability

Prologue To Introduction To The Conditional Probability

Based on the probability of some already occurred event, we can infer or reassess the probability of another event, which is the major effect of the conditional probability.

Definition: Conditional Probability

Given the probability $P(C)$ of the already occurred event $C$, we can compute another event $A$’s probability $P(A\vert C)$. The conditional probability is defined:
$\;\;\;\;P(A\vert C)$=$\frac {P(A\cap C)}{P(C)}$, provided that $P(C)>0$.

This implies that the conditonal probability could help to find out the fraction of the probability of event $C$ is also in event $A$, which is $P(A\cap C)$.

Conditional Probability Properties

Given that $P(C)>0$, we can deduce out below properties:
➀$P(A\vert C)$+$P(A^{c}\vert C)$=$1$, holds for all conditions.
➁if $A\cap C$=$0$, then $P(A\vert C)$=$0$.
➂if $C\subset A$, then $A\cap C$=$C$, $P(A\vert C)$=$1$.
➃if $A\subset C$, then $A\cap C$=$A$, $P(A\vert C)$=$\frac {P(A)}{P(C)}\ge P(A)$, since $P(C)\le 1$.

Example: Illustration By Tossing A Fair Die

Suppose you are tossing a fair die, the sample space $\Omega$=$\{1,2,3,4,5,6\}$. We denote the event of numbers smaller than $3$ as $A$=$\{1,2\}$, and denote the event of even numbers as $B$=$\{2,4,6\}$.
➀if we know the current rolled out number is even, what’s the probability the number is smaller than $3$?
$P(A\vert B)$=$\frac {P(A\cap B)}{P(B)}$=$\frac {\frac {1}{6}}{\frac {1}{2}}$=$\frac {1}{3}$,
where $P(A\cap B)$=$P(\{2\})$=$\frac {1}{6}$.
➁if we know the rolled out number is smaller than $3$, what’s the probability the number is even?
$P(B\vert A)$=$\frac {P(B\cap A)}{P(A)}$=$\frac {\frac {1}{6}}{\frac {1}{3}}$=$\frac {1}{2}$.

Example: Illustration By Fuel Residence Time

Given a engine full of chemical fuel in its combustion chamber and just starts, we denote the event that the particle is left as non-comleted chemical reaction state after t seconds as $R_{t}$.

Suppose the probability of such chemical reaction is in exponential distribution, the probability of $R_{t}$, $P(R_{t})$=$e^{-t}$.

Then, the probability of the particle stay over 4 seconds will stay over 5 seconds would be to ask for $P(R_{5}\vert R_{4})$.
➀$R_{4}$=$e^{-4}$
➁$R_{5}$=$e^{-5}$
➂since $R_{5}\subset R{4}$, we have $R_{5}\cap R_{4}$=$R_{5}$.
Therefore, $P(R_{5}\vert R_{4})$=$\frac {P(R_{5}\cap R_{4})}{P(R_{4})}$=$\frac {P(R_{5})}{P(R_{4})}$=$e^{-1}$.

The Probability Chaining Rule

The probability chaining rule has it that:
➀$P(A\cap C)=P(A\vert C)\cdot P(C)$
➁$P((A\cap B)\vert C)$=$P(A\vert (B\cap C))\cdot P(B\vert C)$
➂$P(A\cap B\cap C)$=$P(A\vert (B\cap C))\cdot P(B\vert C)\cdot P(C)$

proof::mjtsai1974

➀begin from the conditional probability:
$P(A\vert C)$=$\frac {P(A\cap C)}{P(C)}$
$\Leftrightarrow P(A\cap C)=P(A\vert C)\cdot P(C)$
➁$P((A\cap B)\vert C)$
=$\frac {P((A\cap B)\cap C)}{P(C)}$
=$\frac {P(A\cap (B\cap C))}{P(C)}$
=$\frac {P(A\vert (B\cap C))\cdot P(B\cap C)}{P(C)}$
=$P(A\vert (B\cap C))\cdot P(B\vert C)$
➂from above,
$\frac {P((A\cap B)\cap C)}{P(C)}$=$P(A\vert (B\cap C))\cdot P(B\vert C)$
$\Leftrightarrow P(A\cap B\cap C)$=$P(A\vert (B\cap C))\cdot P(B\vert C)\cdot P(C)$

Also known as the multiplication rule.
Below expression illustrates probability chaining rule extension:
$P(N_{1}\cap N_{2}\cap N_{3}\cap …\cap N_{m})$
=$P(N_{1}\vert (N_{2}\cap N_{3}\cap …\cap N_{m}))$
$\;\;\;\;\cdot P(N_{2}\vert (N_{3}\cap …\cap N_{m}))$
$\;\;\;\;...$
$\;\;\;\;\cdot P(N_{m-1}\vert N_{m})$
$\;\;\;\;\cdot P(N_{m})$

Example: Illustration By Fuel Residence Time For Extension

If we are given the same condition to the engine containing a combustion chamber in it, we’d like to estimate the probability of the particle stay over 1 seconds will stay over 10 seconds.

Suppose the chemical particle still left at 10-th second is the final one molecular.

And the probability of such chemical reaction is in exponential distribution, the probability of $R_{t}$, $P(R_{t})$=$e^{-t}$.

proof::mjtsai1974

This is to ask for $P(R_{10}\vert R_{1})$=$\frac {P(R_{10}\cap R_{1})}{P(R_{1})}$.
➀by the given assumption, the final particle is in $R_{10}$, it just passed through $R_{1}$,$R_{2}$,...,$R_{10}$.
Hence, $(R_{10}\cap R_{1})$=$(R_{10}\cap R_{9}\cap … \cap R_{1})$
➁by the probability chaining rule,
$P(R_{10}\cap R_{9}\cap … \cap R_{1})$
=$P(R_{10}\vert (R_{9}\cap … \cap R_{1}))$
$\;\;\;\;\cdot P(R_{9}\vert (R_{8}\cap … \cap R_{1}))$
$\;\;\;\;\cdot P(R_{8}\vert (R_{7}\cap … \cap R_{1}))$
$\;\;\;\;…$
$\;\;\;\;\cdot P(R_{3}\vert (R_{2}\cap R_{1}))$
$\;\;\;\;\cdot P(R_{2}\vert R_{1})$
$\;\;\;\;\cdot P(R_{1})$
➂with each multiplication term equaling to $e^{-1}$,
$P(R_{10}\cap R_{9}\cap … \cap R_{1})$=$e^{-10}$
then, $P(R_{10}\vert R_{1})$=$\frac {P(R_{10}\cap R_{1})}{P(R_{1})}$=$e^{-9}$.

Introduction To The Bayes Theorem

Prologue To Introduction To The Bayes Theorem

The Bayes theorem has been developed and evolved over time to enhance the accuracy of conditional probability prediction result, especially, when you have only a few data gathered and would like to have a convincible, plausible result. It could be pervasively found in the machine learning, reinforcement learning, wherein the POMDP transition probability is one such model.

Law Of Total Probability

Consider in a random experiment, given below conditions:
➀$\Omega$=$\{B_{1},B_{2},…,B_{n}\}$, where $B_{i}\cap B_{j}$=$0$.
➁within the sample space, there exist another event $A$, partitioned by $B_{i}$ randomly.
We have $P(A)$=$P(A\vert B_{1})\cdot P(B_{1})$+…+$P(A\vert B_{n})\cdot P(B_{n})$

proof:
➀by intersection of $A$ and $\Omega$ could we get $A$:
$A$
=$A\cap \Omega$
=$A\cap (B_{1}\cup B_{2}\cup…\cup B_{n})$
=$(A\cap B_{1})\cup (A\cap B_{2})…\cup (A\cap B_{n})$.
➁the total probability of event $A$:
$P(A)$
=$P((A\cap B_{1})\cup (A\cap B_{2})…\cup (A\cap B_{n}))$
=$P(A\cap B_{1})$+$P(A\cap B_{2})$+…+$P(A\cap B_{n})$
=$P(A\vert B_{1})\cdot P(B_{1})$+…+$P(A\vert B_{n})\cdot P(B_{n})$

The Bayes Theorem

Given 2 distinct events $A$ and $B$, the term $P(A\cap B)$ can interconnect below 2 expression:
➀$P(A\cap B)$=$P(A\vert B)\cdot P(B)$
➁$P(B\cap A)$=$P(B\vert A)\cdot P(A)$

The sequence order in intersection changes nothing:

$P(B\vert A)\cdot P(A)$=$P(A\vert B)\cdot P(B)$, then,
$\;\;\;\;P(B\vert A)$=$\frac {P(A\vert B)\cdot P(B)}{P(A)}$…Bayes Theorem

The General Form of Bayes Theorem

By using the law of total probability, the general form of Bayes theorem describing the probability of event $B$, given event $A$ could be expressed in below terms:
$\;\;\;\;P(B_{i}\vert A)$=$\frac {P(A\vert B_{i})\cdot P(B_{i})}{P(A\vert B_{1})\cdot P(B_{1})+…+P(A\vert B_{n})\cdot P(B_{n})}$

Example: Illustration By Rainy And Sunny Days In One Week

[Question]

Suppose we have a fully record of the weather in the past one week, they are rainy and sunny periods. Say $P(Rainy)$=$\frac {3}{7}$, $P(Sunny)$=$\frac {4}{7}$.

In the rainy period of time, assume such probability when we look into the sky, we can see sun shinning(quiet strange earth weather), $P(Sunny\vert Rainy)$=$\frac {1}{5}$, that is $\frac {1}{5}$ of the time you can see the sun shining in the rainy days.

The we’d like to know how often could we have rain drops when we are under the sun shinning.

[Answer]

This is to ask for $P(Rainy\vert Sunny)$.
$P(Rainy\vert Sunny)$=$\frac {P(Sunny\vert Rainy)\cdot P(Rainy)}{P(Sunny)}$=$\frac {\frac {1}{5}\cdot\frac {3}{7}}{\frac {4}{7}}$=$\frac {3}{20}$

Example: Illustration By Dogs And cats

Suppose there exists 60 dos and 40 cats in the animal hospital, 20 female dogs and 10 female cats.

[1] When we pick up one female animal, what's the probability it is a dog?

This is asking for $P(Dog\vert Female)$. Total probability of female animals should be calculated out in advance.
➀probability of female animals:
$P(Female)$
=$P(Female \cap(Dog \cup Cat))$
=$P(Female \cap Dog)$+$P(Female \cap Cat)$
=$\frac {20}{100}$+$\frac {10}{100}$
=$\frac {3}{10}$
➁$P(Dog\vert Female)$
=$\frac {P(Female \cap Dog)}{P(Female)}$
=$\frac {P(Female \vert Dog)\cdot P(Dog)}{P(Female)}$
=$\frac {\frac {20}{60}\cdot\frac {60}{100}}{\frac {3}{10}}$
=$\frac {2}{3}$

[2] Back to $P(Dog\vert Female)$ again, for the term $\frac {P(Female \cap Dog)}{P(Female)}$, can we instead to express in the term $\frac {P(Dog \cap Female)}{P(Female)}$?

Yes! Let’s expand from it to verify.
➀$P(Dog\vert Female)$
=$\frac {P(Dog \cap Female)}{P(Female)}$
=$\frac {P(Dog \vert Female)\cdot P(Female)}{P(Female)}$
➁trivially, back to $\frac {P(Dog \cap Female)}{P(Female)}$ is not incorporating anything already given to help to figure out something we don't know!! Although, we can also count it by means from the given sample space of 30 females, 20 dogs in it, to get the answer of $\frac {2}{3}$.

[Conclusion]

The major purpose of Bayes theorem is to use known probability to figure out unknown probability in its maximum likelihood, instead of manual counting in the sample space.
The tiny skill would be to advice the given condition in the first order in the intersection term, the to be asked probability in the second term.