close

Вход

Забыли?

вход по аккаунту

?

Generalized Belief Propagation and Free Energy Minimization

код для вставкиСкачать
Some Surprises in the Theory of
Generalized Belief Propagation
Jonathan Yedidia
Mitsubishi Electric Research Labs (MERL)
Collaborators:
Bill Freeman (MIT)
Yair Weiss (Hebrew University)
See “Constructing Free Energy Approximations and
Generalized Belief Propagation Algorithms,” MERL
TR2004-040, to be published in IEEE Trans. Info. Theory.
Outline
• Introduction to GBP
– Quick Review of Standard BP
– Free Energies
– Region Graphs and Valid Approximations
• Some Surprises
• Maxent Normal Approximations
& a useful heuristic
Factor Graphs
p( X ) пЂЅ
1
пѓ•
Z
(Kschischang,
et.al. 2001)
fa ( X a )
a
1
2
A
p ( x1 , x 2 , x 3 , x 4 ) пЂЅ
3
B
1
Z
4
C
f A ( x1 , x 2 , x 3 ) f B ( x 2 , x 3 , x 4 ) f C ( x 4 )
Computing Marginal Probabilities
pS ( X S ) пЂЅ

p( X )
X \XS
Fundamental for
• Decoding error-correcting codes
• Inference in Bayesian networks
• Computer vision
• Statistical physics of magnets
Non-trivial because of the huge number
of terms in the sum.
Error-correcting Codes
+
+
(Tanner, 1981
Gallager, 1963)
+
Marginal Probabilities = A posteriori bit probabilities
Statistical Physics
Marginal Probabilities=
local magnetization
Standard Belief Propagation
i
bi ( x i ) п‚µ
пѓ•m
aп‚®i
( xi )
aпѓЋ N ( i )
“beliefs”
ba ( X a ) п‚µ f a ( X a )
a
“messages”
пѓ• пѓ•m
bп‚® i
( xi )
iпѓЋ N ( a ) b пѓЋ N ( i ) \ a
The “belief” is the BP approximation
of the marginal probability.
BP Message-update Rules
Using b i ( x i ) пЂЅ
b
a
( X a ) , we get
X a \ xi
m a п‚® i ( xi ) пЂЅ

X a \ xi
i
a
fa ( X a )
пѓ•
jпѓЋ N ( a ) \ i bпѓЋ N ( j ) \ a
i
=
пѓ•m
a

bп‚® j
(x j )
Variational (Gibbs) Free Energy
Kullback-Leibler Distance:
D ( b || p ) п‚є
 b ( X ) ln
X
b( X )
p( X )
“Boltzmann’s Law” (serves to define the “energy”):
p( X ) пЂЅ
1
пѓ•
Z
fa ( X a ) пЂЅ
a
1
Z
пЂ­ H (b )
U (b )
D ( b || p ) п‚є
exp пЃ›пЂ­ E ( X ) пЃќ
 b ( X ) E ( X )   b ( X ) ln b ( X )  ln Z
X
X
Variational Free Energy G пЂЁb пЂ©
is minimized when b ( X ) пЂЅ p ( X )
So we’ve replaced the intractable
problem of computing marginal
probabilities with the even more
intractable problem of minimizing a
variational free energy over the space
of possible beliefs.
But the point is that now we can
introduce interesting approximations.
Region-based Approximations to the
Variational Free Energy (Kikuchi, 1951)
Exact:
G [ b ( X )]
Regions: G [{ b r ( X r )}]
(intractable)
Defining a “Region”
A region r is a set of variable nodes Vr
and factor nodes Fr such that if a factor
node a belongs to Fr , all variable nodes
neighboring a must belong to Vr.
Regions
Not a
Region
Region Definitions
Region states:
Xr
Region beliefs:
br пЂЁ X r пЂ©
Region energy:
E r  X r     ln f a  X a 
aпѓЋ Fr
Region average energy:
U r пЂЁb r пЂ© пЂЅ
 b  X E  X 
r
r
r
r
Xr
Region entropy:
H r b r     b r  X r  ln b r  X r 
Xr
Region free energy:
G r пЂЁb r пЂ© пЂЅ U r пЂЁb r пЂ© пЂ­ H r пЂЁb r пЂ©
“Valid” Approximations
Introduce a set of regions R, and a counting number cr
for each region r in R, such that cr=1 for the largest
regions, and for every factor node a and variable node i,
 c I  a  F    c I i  V   1
r
rпѓЋ R
r
r
r
Indicator functions
Count every node once!
G пЂЁ{b r } пЂ© пЂЅ
 c G b 
r
rпѓЋ R
r
r
Entropy and Energy
•  c r I  a  F r   1 : Counting each factor
node once makes the approximate energy
exact (if the beliefs are).
•  c r I i  V r   1 : Counting each variable
node once makes the approximate entropy
reasonable (at least the entropy is correct
when all states are equiprobable).
Comments
• We could actually use different counting numbers
for energy and entropy—leads to “fractional BP”
(Weigerinck & Heskes 2002) or “convexified free energy”
(Wainwright et.al., 2002) algorithms.
• Of course, we also need to impose normalization
and consistency constraints on our free energy.
Methods to Generate Valid
Region-based Approximations
Region Graphs
Cluster
Variational
Method
Junction graphs
Aji-McEliece
Junction trees
(Kikuchi)
Bethe
(Bethe is example of Kikuchi for Factor graphs with no 4-cycles;
Bethe is example of Aji-McEliece for “normal” factor graphs.)
Example of a Region Graph
A,C,1,2,4,5
2,5
B,D,2,3,5,6
C,E,4,5,7,8
D,F,5,6,8,9
C,4,5
D,5,6
5,8
[5] is a child of [2,5]
5
Definition of a Region Graph
• Labeled, directed graph of regions.
• Arc may exist from region A to region B
if B is a subset of A.
• Sub-graphs formed from regions
containing a given node are connected.
• c  1   c where A (r ) is the set of
ancestors of region r. (Mobius Function)
• We insist that
 a ,  i :  c r I  a  F r    c r I i  V r   1.
r
s
sпѓЋ A ( r )
rпѓЋ R
Bethe Method
(after Bethe, 1935)
1
Two sets of regions:
Large regions containing a
single factor node a and all
attached variable nodes. c r пЂЅ 1
2
A
4
Small regions containing a
single variable node i. c r пЂЅ 1 пЂ­ d i
C
1
B;2,3,5,6
2
3
D;5,6
C;4,5
4
5
B
5
D
E
7
A;1,2,4,5
3
6
6
F
8
9
F;5,6,8,9
E;4,5,7,8
7
8
9
Bethe Approximation to
Gibbs Free Energy
G Bethe пЂЅ

a
Xa
пѓ¦ ba пЂЁ X a пЂ© пѓ¶
пѓ·пЂ«
b a пЂЁ X a пЂ© ln пѓ§пѓ§
пѓ·
пЂЁ
пЂ©
f
X
a пѓё
пѓЁ a
 1  d  b  x  ln b  x 
i
i
i
i
i
i
xi
Equal to the exact Gibbs free energy when the
factor graph is a tree because in that case,
b( X ) пЂЅ
1пЂ­ d i
пѓ• b пЂЁ X пЂ©пѓ• b пЂЁ x пЂ©
a
a
a
i
i
i
Minimizing the Bethe Free Energy
L пЂЅ G Bethe пЂ«
  {  b ( x )  1}
i
i
пЂ«
ai
 bi ( x i )
пЂЅ0
L
 ba ( X a )
пЂЅ0
iпѓЋ N ( a )
xi
i
xi
    x 
a
L
i
i
пѓ¬
пѓј
  b a  X a   bi  x i 
пѓ® X a \ xi
пѓѕ
пѓ¦ 1
пѓ¶
b i ( x i ) п‚µ exp пѓ§
пЃ¬ (x )пѓ·
 d  1  ai i 
aпѓЋ N ( i )
пѓЁ i
пѓё
пѓ¦
b a ( X a ) п‚µ exp пѓ§ пЂ­ E a ( X a ) пЂ«
пѓ§
пѓЁ
пѓ¶
  ai ( x i ) 
iпѓЋ N ( a )
пѓё
Bethe = BP
Identify
пЃ¬ ai ( x i ) пЂЅ ln
пѓ•m
bп‚® i
( xi )
bпѓЋ N ( i ) п‚№ a
to obtain BP equations:
i
bi ( x i ) п‚µ
пѓ•m
aп‚®i
( xi )
aпѓЋ N ( i )
ba ( X a ) п‚µ f a ( X a )
пѓ• пѓ•m
iпѓЋ N ( a ) b пѓЋ N ( i ) \ a
bп‚® i
( xi )
Cluster Variation Method
(Kikuchi, 1951)
Form a region graph with an arbitrary number of
different sized regions. Start with largest regions.
cr пЂЅ 1
Then find intersection regions of the largest
regions, discarding any regions that are sub-regions
of other intersection regions.
Continue finding intersections of those intersection
regions, etc.
All intersection regions obey
cr пЂЅ 1 пЂ­
 c , where
s
sпѓЋ S ( r )
S(r ) is the set of super-regions of region r.
Region Graph Created Using CVM
A,C,1,2,4,5
2,5
B,D,2,3,5,6
C,E,4,5,7,8
D,F,5,6,8,9
C,4,5
D,5,6
5,8
5
Minimizing a Region Graph
Free Energy
• Minimization is possible, but it may be
awkward because of all the constraints
that must be satisfied.
• We introduce generalized belief
propagation algorithms whose fixed
points are provably identical to the
stationary points of the region graph
free energy.
Generalized Belief Propagation
• Belief in a region is the product of:
– Local information (factors in region)
– Messages from parent regions
– Messages into descendant regions from
parents who are not descendants.
• Message-update rules obtained by
enforcing marginalization constraints.
Generalized
Belief
Propagation
1
2
3
4
5
6
7
8
9
1245
2356
4578
5689
25
45
56
58
5
Generalized
Belief
Propagation
1
2
3
4
5
6
7
8
9
1245
2356
4578
5689
25
45
56
58
5
b5 п‚µ m 2 п‚® 5 m 4 п‚® 5 m 6 п‚® 5 m 8 п‚® 5
Generalized
Belief
Propagation
1
2
3
4
5
6
7
8
9
1245
2356
4578
5689
25
45
56
58
5
b 45 п‚µ [ f 45 ][ m12 п‚® 45 m 78 п‚® 45 m 2 п‚® 5 m 6 п‚® 5 m 8 п‚® 5 ]
Generalized
Belief
Propagation
1
2
3
4
5
6
7
8
9
1245
2356
4578
5689
25
45
56
58
5
b1245 п‚µ [ f 12 f 14 f 25 f 45 ][ m 36 п‚® 25 m 78 п‚® 45 m 6 п‚® 5 m 8 п‚® 5 ]
Generalized
Belief
Propagation
1
2
Use Marginalization Constraints to
Derive Message-Update Rules
3
1
2
3
4
5
6
7
8
9

4
5
6
7
8
9
=
b5 ( x 5 ) пЂЅ
b
x4
45
( x4 , x5 )
Generalized
Belief
Propagation
1
2
Use Marginalization Constraints to
Derive Message-Update Rules
3
1
2
3
4
5
6
7
8
9

4
5
6
7
8
9
=
b5 ( x 5 ) пЂЅ
b
x4
45
( x4 , x5 )
Generalized
Belief
Propagation
1
2
Use Marginalization Constraints to
Derive Message-Update Rules
3
1
2
3
4
5
6
7
8
9

4
5
6
7
8
9
=
b5 ( x 5 ) пЂЅ
b
x4
45
( x4 , x5 )
Generalized
Belief
Propagation
1
Use Marginalization Constraints to
Derive Message-Update Rules
2
3
1
2
3
4
5
6
7
8
9

4
5
6
7
8
9
m 4п‚® 5 ( x5 ) п‚µ

x4
=
f 45 ( x 4 , x 5 ) m 12 п‚® 45 ( x 4 , x 5 ) m 78 п‚® 45 ( x 4 , x 5 )
(Mild) Surprise #1
• Region beliefs (even those given by
Bethe/BP) may not be realizable as the
marginals of any global belief.
пѓ¦ .4
b a ( x1 , x 2 ) пЂЅ b b ( x1 , x 3 ) пЂЅ пѓ§пѓ§
пѓЁ .1
1
a
2
пѓ¦ .1
b c ( x 2 , x 3 ) пЂЅ пѓ§пѓ§
пѓЁ .4
b
c
3
.1 пѓ¶
пѓ·пѓ·
.4 пѓё
.4 пѓ¶
пѓ·пѓ·
.1 пѓё
is a perfectly acceptable
solution to BP, but cannot
arise from any b ( x1 , x 2 , x 3 )
(Minor) Surprise #2
• For some sets of beliefs (sometimes even when
the marginal beliefs are the exactly correct
ones), the Bethe entropy is negative!
Each large region has counting
number 1, each small region has
counting number -2, so if the beliefs
are b ( x ) пЂЅ пЂЁ. 5 . 5 пЂ©
i
пѓ¦ .5
b ( x i , x j ) пЂЅ пѓ§пѓ§
пѓЁ0
0пѓ¶
пѓ·пѓ·
.5 пѓё
then the entropy is negative:
H пЂЅ 6 ln 2 пЂ« 4 ( пЂ­ 2 ) ln 2 пЂЅ пЂ­ 2 ln 2
(Serious) Surprise #3
• When there are no interactions, the minimum of
the CVM free energy should correspond to the
equiprobable global distribution, but sometimes
it doesn’t!
Example
Fully connected pairwise model, using CVM and all
triplets as the largest regions, with pairs and singlets
as the intersection regions.
Triplets
Pairs
Singlets
# of regions
Counting #
N ( N пЂ­ 1)( N пЂ­ 2 ) / 2
1
N ( N пЂ­ 1) / 2
3пЂ­ N
N
( N пЂ­ 2 )( N пЂ­ 3 ) / 2
Two simple distributions
• Equiprobable distribution: each state is equally
probable for all beliefs. Each region gives
– e.g.
пѓ¦ . 25
b ( x i , x j ) пЂЅ пѓ§пѓ§
пѓЁ . 25
. 25 пѓ¶
пѓ·
. 25 пѓ·пѓё
H r пЂЅ n r ln 2
• Distribution obtained by marginalizing the
global distribution that only allows all-zeros or
all-ones. “Equipolarized distribution”
– e.g.
пѓ¦ .5
b ( x i , x j ) пЂЅ пѓ§пѓ§
пѓЁ0
0пѓ¶
пѓ·
. 5 пѓ·пѓё
Each region gives
H r пЂЅ ln 2
Surprise #3 (cont.)
• Surprisingly, the “equipolarized”
distribution can have a greater entropy
than the equiprobable distribution for
some valid CVM approximations.
• This is a serious problem, because if
the model gets the wrong answer
without any interactions, it can’t be
expected to be correct with interactions.
The Fix: Consider only
Maxent-Normal Approximations
• A maxent-normal approximation is one which gives a
maximum of the entropy when the beliefs correspond to
the equiprobable distribution.
• Bethe approximations are provably maxent-normal.
• Some other CVM and other region-graph
approximations are also provably maxent-normal.
– Using quartets on square lattices
– Empirically, these approximations give good results
• Other valid CVM or region-graph approximations are not
maxent-normal.
– Empirically, these approximations give poor results
A Very Simple Heuristic
• Sum all the counting numbers.
– If the sum is greater than N, the equipolarized
distribution will have a greater entropy than the
equiprobable distribution. (Too Much—Very Bad)
– If the sum is less than 0, the equipolarized distribution
will have a negative entropy. (Too Little)
– If the sum equals 1, the equipolarized distribution will
have the correct entropy. (Just Right)
10x10 Ising Spin Glass
Random
fields
Random
interactions
Conclusions
• Standard BP essentially equivalent to
minimizing the Bethe free energy.
• Bethe method and cluster variation method are
special cases of the more general region graph
method for generating valid free energy
approximations.
• GBP is essentially equivalent to minimizing
region graph free energy.
• One should be careful to use maxent-normal
approximations.
• Sometimes you can prove that your
approximation is maxent-normal; and simple
heuristics can be used to prove when it isn’t.
Документ
Категория
Презентации
Просмотров
5
Размер файла
362 Кб
Теги
1/--страниц
Пожаловаться на содержимое документа