mjtsai1974's Dev Blog Welcome to mjt's AI world

Propagation In The Bayesian Network

Prologue To Propagation In The Bayesian Network

The respective and retrospective inferences ascertained by means of variable elimination is also computation expensive, it works for one variable at a time, the same operation would be repeated upon new inference on other variables is requested. An alternative to overcome these limitations is by using propagation in the Bayesian network.

Posterior Probability(Belief) Update Upon New Evidence Arrives

[1] Posterior probability(belief)

The posterior probability(belief) of a random variable X=x, given evidence E=e, is expressed as Bel(x)=P(x|e). The evidence could be further classified as 2 distinct subsets:
eX denotes the evidence introduced through its children nodes.
e+X stands for the evidence coming from its parent nodes, for X to be a root node, e+X is the background.

[2] Deduction of Bel(x)

Assume eX and e+X are independent, then the belief of X=x:
Bel(x)
=P(x|eX,e+X)
=P(eX,x|e+X)P(eX|e+X)
=P(eX|x,e+X)P(x|e+X)P(eX|e+X)
=P(eX|x)P(x|e+X)P(eX)

Why we have such deduction?
➀the evidence eX is given by hypothesis X=x, and the background context e+X, that explains P(eX|x,e+X).
P(x|e+X) says that the hypothesis X=x is propagated from the background context e+X.
➂the normalizing factor P(eX|e+X) encompasses everything ranging from the background context to the final observed evidence, since eX and e+X are independent, the denominator part becomes P(eX).
P(eX)
=P(eX|e+X)
=P(eX|x,e+X)P(x|e+X)
=P(eX|x)P(x|e+X)
=λ(X)π(X)

[3] Generalization of Bel(x)

Most of the textbooks have below common definition:
π(x)=P(x|e+X) is the respective(predictive) direction of propagation.
λ(x)=P(eX|x) is retrospective(diagnostic) direction of propagation.

We thus express the belief of x as Bel(x)=απ(x)λ(x),
where α is the normalizing constant, in this illustration,
α
=[P(eX)]1
=[xπ(x)λ(x)]1

[4] Message passing is propagation

New evidence enters a network when a variable is instantiated, ie when it receives a new value from the outside world. When this happens, the posterior probability of each node in the whole network must be re-calculated.

This is achieved by message passing, known as progapation.

Propagation Illustration In Causal Tree

Given a causal tree of this serial chain Bayesian network, where λ(Y)=P(eY|Y).

[1] Forward propagation

W is now the root node, the background context is the only forward propagated evidence.
π(w)=P(w|e+W)=P(w), for wW.
➁for node X, it receives the forward propagated evidence pass through it, thus:
π(x)
=P(x|e+X)
=P(x|e+WX)
=wP(x|w)P(w|e+W), for wW.
π(X)=P(X|W)π(W)

Node X receives W forward propagated evidence P(w|e+W), with each X=x weighted by P(x|w) for all wW.

➂similarly, π(Y)=P(Y|X)π(X)

[2] Backward propagation

Y is the leaf node, the evidence of observation is the only backward propagated message.
λ(y)=P(eY|y)
➁as X receives the backward propagated evidence from Y, then:
λ(x)
=P(eX|x)
=P(eXY|x)
=yP(y|x)λ(y), for yY.
λ(X)=P(Y|X)λ(Y)

For each xX, λ(x) is weighted by P(y|x) with λ(y), for all yY.

➁similarly, for node W,
λ(w)
=P(eW|w)
=P(eWX|w)
=xP(x|w)λ(x)
λ(W)=P(X|W)λ(X)

Propagation Illustration In Multiple Child Nodes

Now take a look at the illustration for the case of multiple child nodes.

[1] Backward propagation

This time, we begin by inspecting the behavior in λ. We have evidence on the subtree at node X partitioned into 2 disjoint sets eXY1,eXY2.
➀as X receives backward propagated evidence from Y1, Y2, the backward propagation is in joint distribution form:
λ(x)xX
=P(eX|x)
=P(eXY1|x)P(eXY2|x)
=YiYP(eXYi|x)…multiple Yi children
λ(x)=λY1(x)λY2(x)
➁as X backward propagates the evidence to W:
λ(w)wW
=xP(x|w)λ(x)
λX(W)=P(X|W)λ(X)

[2] Forwardward propagation

➀as to the forward propagated evidence in node X, that’s π(X), we evaluate it from its parent node W.
π(x)xX
=P(x|e+X)
=P(x|e+WX)
=wP(x|w)π(w)
π(X)=P(X|W)π(W)

[3] Forwardward propagation to multiple descendent

The forward propagated evidence in node X would be taken into 2 distinct parts. Consider the evidence forward propagated from X to Y1, it consists of 2 major factors:
➀the forward propagated evidence from W.
➁the backward propagated evidence from Y2.
The point is that the total forward propoagated evidence from X to Y2 is the joint probability function in combinatory format of these 2 factors.
πY1(x)
=P(x|e+X,eY2)
=P(eY2|x,e+X)P(x|e+X)P(eY2|e+X)
=P(eY2|x)P(x|e+X)P(eY2|e+X)
, where xX and eY2eX, more precisely, we can take eY2=eXY2, next to expand it.

[4] Forwardward propagation to multiple descendent: extension proof::mjtsai1974

Continue from πY1(x):
➀multiply nominator and denominator by P(eY1|x):
πY1(x)
=P(eY2|x)P(x|e+X)P(eY1|x)P(eY2|e+X)P(eY1|x)
P(eY2|e+X)
=xP(eY2|x,e+X)P(x|e+X)
=xP(eY2|x)P(x|e+X)
=α1XY2
=P(eXY2)eY2 and e+X are independent
➂we also have further deduction below for P(eX):
P(eY2|e+X)P(eY1|x)
=xP(eY2|x)P(x|e+X)P(eY1|x)
=P(eX)
=α1X
➃therefore, rewrite ➀:
πY1(x)
=P(x|e+X,eY2)
=P(eY2|x)P(x|e+X)P(eY1|x)P(eXY2)P(eY1|x)
=P(eY2|x)P(x|e+X)P(eY1|x)α1XY2P(eY1|x)
=αXY2P(eY2|x)P(x|e+X)P(eY1|x)P(eY1|x)
=BelY1(x)λY1(x)
Here, we take BelY1(x)
=αXY2P(eY2|x)P(x|e+X)P(eY1|x)

[5] Forwardward propagation to multiple descendent: generalization

Given that node X has n multiple child nodes {Y1,,Yn}, then, the forward propagated evidence from X to Yk could be generalized in below expression:
πYk(x)
=BelYk(x)λYk(x)
, where BelYk(x)=αXYkikP(eYi|x)P(x|e+X)

Propagation Illustration In Multiple Parent Nodes

Causal polytree is the structure to be used in this section, each node might have multiple parents.

[1] Forward propagation

➀the predictive support at X comes from its 2 parents W1, W2.
π(x)xX
=P(x|e+X)
=P(x|e+W1X,e+W2X)P(x|W1,W2)
=P(x|e+W1X)P(x|e+W2X)P(x|W1,W2)
=w1w2P(x|e+W1X)P(x|e+W2X)P(x|w1,w2)
=w1w2πW1(x)πW2(x)P(x|w1,w2)
➁the generalization of forward propagation at X for n parents:
π(x)=w1,,wnπW1(x)πWn(x)P(x|w1,,wn)
➂as the forward propagation from X to one of its child, say Yk, it will have to combine the forward propagation thus obtain above with the backward propagated evidence from {Y1,,Yk1,Yk+1,,,Ym}, suppose X has m multiple child nodes.

[2] Backward propagation

➀the diagnostic support at X comes from its 2 children Y1, Y2, where λ(x)=λY1(x)λY2(x), must be splitted into 2 parts to be transfered to W1, W2, which are λX(w1),λX(w2).
➁to diagnose in W1 the symptom/evidence backward propagated from X:
λX(w1)
=P(eW1X|W1)
=P(eX,e+W2X|W1)
=P(eX|W1)P(e+W2X|W1)
=P(eX|x)P(x|W1)P(e+W2X)
=P(eX|x)P(x|W1,W2)S1P(e+W2X)
=P(eX|x)P(x|W1,W2)S2P(W2|e+W2X)
=P(eX|x)P(x|W1,W2)βP(e+W2X|W2)
=βxλ(x)w2P(x|w1,w2)πW2(x)
, where the S1,S2,β are arbitrary constants to hold the equality. We turn from P(W2|e+W2X) to P(e+W2X|W2) to express the deduction in terms of πW2(x) for its simplicity.
➂the generalization of n multiple parents:
λX(wi)
=βxλ(x)
wk:kiP(x|w1,,wn)
nk=1:kiπWk(x)

[Cautions]

The belief updated by propagation would result in expensive computation consumption when there exists multiple parents. The introduction of propagation aims to eliminate the potential expensive computation of NP-hard order in variable elimination, now it seems that it is struggling over. Next to see the clique tree series, to be believed workable in a lot references.

Addendum

A T utorial on Bayesian Belief Networks, Mark L Krieg, Surveillance Systems Division Electronics and Surveillance Research Laboratory, DSTO-TN-0403
Lecture 4: Probability Propagation in Singly Connected Bayesian Networks, Duncan Fyfe Gillies, Department of Computing Imperial College London