Propagation In The Bayesian Network
26 Aug 2018Prologue To Propagation In The Bayesian Network
Posterior Probability(Belief) Update Upon New Evidence Arrives
[1] Posterior probability(belief)The posterior probability(belief) of a random variable X=x, given evidence E=e, is expressed as Bel(x)△=P(x|e). The evidence could be further classified as 2 distinct subsets:
[2] Deduction of Bel(x)
➀e−X denotes the evidence introduced through its children nodes.
➁e+X stands for the evidence coming from its parent nodes, for X to be a root node, e+X is the background.Assume e−X and e+X are independent, then the belief of X=x:
Bel(x)
=P(x|e−X,e+X)
=P(e−X,x|e+X)P(e−X|e+X)
=P(e−X|x,e+X)⋅P(x|e+X)P(e−X|e+X)
=P(e−X|x)⋅P(x|e+X)P(e−X)Why we have such deduction?
[3] Generalization of Bel(x)
➀the evidence e−X is given by hypothesis X=x, and the background context e+X, that explains P(e−X|x,e+X).
➁P(x|e+X) says that the hypothesis X=x is propagated from the background context e+X.
➂the normalizing factor P(e−X|e+X) encompasses everything ranging from the background context to the final observed evidence, since e−X and e+X are independent, the denominator part becomes P(e−X).
P(e−X)
=P(e−X|e+X)
=P(e−X|x,e+X)⋅P(x|e+X)
=P(e−X|x)⋅P(x|e+X)
=λ(X)⋅π(X)Most of the textbooks have below common definition:
➀π(x)=P(x|e+X) is the respective(predictive) direction of propagation.
➁λ(x)=P(e−X|x) is retrospective(diagnostic) direction of propagation.We thus express the belief of x as Bel(x)=α⋅π(x)⋅λ(x),
[4] Message passing is propagation
where α is the normalizing constant, in this illustration,
α
=[P(e−X)]−1
=[∑xπ(x)⋅λ(x)]−1
New evidence enters a network when a variable is instantiated, ie when it receives a new value from the outside world. When this happens, the posterior probability of each node in the whole network must be re-calculated.
This is achieved by message passing, known as progapation.
Propagation Illustration In Causal Tree
Given a causal tree of this serial chain Bayesian network, where λ(Y)=P(e−Y|Y).
[1] Forward propagation
➀W is now the root node, the background context is the only forward propagated evidence.
Node X receives W forward propagated evidence P(w|e+W), with each X=x weighted by P(x|w) for all w∈W.
π(w)=P(w|e+W)=P(w), for w∈W.
➁for node X, it receives the forward propagated evidence pass through it, thus:
π(x)
=P(x|e+X)
=P(x|e+WX)
=∑wP(x|w)⋅P(w|e+W), for w∈W.
⇒π(X)=P(X|W)⋅π(W)➂similarly, π(Y)=P(Y|X)⋅π(X)
[2] Backward propagation➀Y is the leaf node, the evidence of observation is the only backward propagated message.
For each x∈X, λ(x) is weighted by P(y|x) with λ(y), for all y∈Y.
λ(y)=P(e−Y|y)
➁as X receives the backward propagated evidence from Y, then:
λ(x)
=P(e−X|x)
=P(e−XY|x)
=∑yP(y|x)⋅λ(y), for y∈Y.
⇒λ(X)=P(Y|X)⋅λ(Y)➁similarly, for node W,
λ(w)
=P(e−W|w)
=P(e−WX|w)
=∑xP(x|w)⋅λ(x)
⇒λ(W)=P(X|W)⋅λ(X)
Propagation Illustration In Multiple Child Nodes
Now take a look at the illustration for the case of multiple child nodes.
[1] Backward propagation
This time, we begin by inspecting the behavior in λ. We have evidence on the subtree at node X partitioned into 2 disjoint sets e−XY1,e−XY2.
[2] Forwardward propagation
➀as X receives backward propagated evidence from Y1, Y2, the backward propagation is in joint distribution form:
λ(x)…x∈X
=P(e−X|x)
=P(e−XY1|x)⋅P(e−XY2|x)
=∏Yi∈YP(e−XYi|x)…multiple Yi children
⇒λ(x)=λY1(x)⋅λY2(x)
➁as X backward propagates the evidence to W:
λ(w)…w∈W
=∑xP(x|w)⋅λ(x)
⇒λX(W)=P(X|W)⋅λ(X)➀as to the forward propagated evidence in node X, that’s π(X), we evaluate it from its parent node W.
[3] Forwardward propagation to multiple descendent
π(x)…x∈X
=P(x|e+X)
=P(x|e+WX)
=∑wP(x|w)⋅π(w)
⇒π(X)=P(X|W)⋅π(W)The forward propagated evidence in node X would be taken into 2 distinct parts. Consider the evidence forward propagated from X to Y1, it consists of 2 major factors:
[4] Forwardward propagation to multiple descendent: extension proof::mjtsai1974
➀the forward propagated evidence from W.
➁the backward propagated evidence from Y2.
The point is that the total forward propoagated evidence from X to Y2 is the joint probability function in combinatory format of these 2 factors.
πY1(x)
=P(x|e+X,e−Y2)
=P(e−Y2|x,e+X)⋅P(x|e+X)P(e−Y2|e+X)
=P(e−Y2|x)⋅P(x|e+X)P(e−Y2|e+X)
, where x∈X and e−Y2∈e−X, more precisely, we can take e−Y2=e−XY2, next to expand it.Continue from πY1(x):
[5] Forwardward propagation to multiple descendent: generalization
➀multiply nominator and denominator by P(e−Y1|x):
πY1(x)
=P(e−Y2|x)⋅P(x|e+X)⋅P(e−Y1|x)P(e−Y2|e+X)⋅P(e−Y1|x)
➁P(e−Y2|e+X)
=∑xP(e−Y2|x,e+X)⋅P(x|e+X)
=∑xP(e−Y2|x)⋅P(x|e+X)
=α−1XY2
=P(e−XY2)…e−Y2 and e+X are independent
➂we also have further deduction below for P(e−X):
P(e−Y2|e+X)⋅P(e−Y1|x)
=∑xP(e−Y2|x)⋅P(x|e+X)⋅P(e−Y1|x)
=P(e−X)
=α−1X
➃therefore, rewrite ➀:
πY1(x)
=P(x|e+X,e−Y2)
=P(e−Y2|x)⋅P(x|e+X)⋅P(e−Y1|x)P(e−XY2)⋅P(e−Y1|x)
=P(e−Y2|x)⋅P(x|e+X)⋅P(e−Y1|x)α−1XY2⋅P(e−Y1|x)
=αXY2⋅P(e−Y2|x)⋅P(x|e+X)⋅P(e−Y1|x)P(e−Y1|x)
=BelY1(x)λY1(x)
Here, we take BelY1(x)
=αXY2⋅P(e−Y2|x)⋅P(x|e+X)⋅P(e−Y1|x)Given that node X has n multiple child nodes {Y1,…,Yn}, then, the forward propagated evidence from X to Yk could be generalized in below expression:
πYk(x)
=BelYk(x)λYk(x)
, where BelYk(x)=αXYk⋅∏i≠kP(e−Yi|x)⋅P(x|e+X)
Propagation Illustration In Multiple Parent Nodes
Causal polytree is the structure to be used in this section, each node might have multiple parents.
[1] Forward propagation
➀the predictive support at X comes from its 2 parents W1, W2.
[2] Backward propagation
π(x)…x∈X
=P(x|e+X)
=P(x|e+W1X,e+W2X)⋅P(x|W1,W2)
=P(x|e+W1X)⋅P(x|e+W2X)⋅P(x|W1,W2)
=∑w1∑w2P(x|e+W1X)⋅P(x|e+W2X)⋅P(x|w1,w2)
=∑w1∑w2πW1(x)⋅πW2(x)⋅P(x|w1,w2)
➁the generalization of forward propagation at X for n parents:
π(x)=∑w1,…,wnπW1(x)⋯πWn(x)⋅P(x|w1,…,wn)
➂as the forward propagation from X to one of its child, say Yk, it will have to combine the forward propagation thus obtain above with the backward propagated evidence from {Y1,…,Yk−1,Yk+1,…,,Ym}, suppose X has m multiple child nodes.➀the diagnostic support at X comes from its 2 children Y1, Y2, where λ(x)=λY1(x)⋅λY2(x), must be splitted into 2 parts to be transfered to W1, W2, which are λX(w1),λX(w2).
[Cautions]
➁to diagnose in W1 the symptom/evidence backward propagated from X:
λX(w1)
=P(e−W1X|W1)
=P(e−X,e+W2X|W1)
=P(e−X|W1)⋅P(e+W2X|W1)
=P(e−X|x)⋅P(x|W1)⋅P(e+W2X)
=P(e−X|x)⋅P(x|W1,W2)⋅S1⋅P(e+W2X)
=P(e−X|x)⋅P(x|W1,W2)⋅S2⋅P(W2|e+W2X)
=P(e−X|x)⋅P(x|W1,W2)⋅β⋅P(e+W2X|W2)
=β∑xλ(x)⋅∑w2P(x|w1,w2)⋅πW2(x)
, where the S1,S2,β are arbitrary constants to hold the equality. We turn from P(W2|e+W2X) to P(e+W2X|W2) to express the deduction in terms of πW2(x) for its simplicity.
➂the generalization of n multiple parents:
λX(wi)
=β∑xλ(x)
⋅∑wk:k≠iP(x|w1,…,wn)
⋅∏nk=1:k≠iπWk(x)The belief updated by propagation would result in expensive computation consumption when there exists multiple parents. The introduction of propagation aims to eliminate the potential expensive computation of NP-hard order in variable elimination, now it seems that it is struggling over. Next to see the clique tree series, to be believed workable in a lot references.
Addendum
➀A T utorial on Bayesian Belief Networks, Mark L Krieg, Surveillance Systems Division Electronics and Surveillance Research Laboratory, DSTO-TN-0403
➁Lecture 4: Probability Propagation in Singly Connected Bayesian Networks, Duncan Fyfe Gillies, Department of Computing Imperial College London