- Published at
Convex Hypertree Based Reparametrization and Convex Upperbounds
Todo
- Authors
- Name
- Dr. Dmitrij Sitenko
- Mathematical Image Processing at Ruprecht Karls University Heidelberg
Table of Contents
To motivate why reparametrizations of probability distributions are important we first consider a simple scenario of exponenential family on a tree structured graph . We already know one simple clique factorization (edge and nodes on )
As we already know from the concepts of graphical models a major chalange of inference tasks is the access to partition function . Assumming for a moment that we have access to the marginals , then we can factorize directly in terms of the margnials via
where now due to the proper normalization of the marginals the partition function vanishes, i.e. . Now given as exponential family with initial regular parametrization leveraging factorization (2) implies
where the correspodning log partition function becomes . In this way we can observe that instead computing normalization contansts we alternatively can seek for a reparametrization parameter which factorizes according to . If the graph is a tree, such reparametrization is easely obtained by marginal inference via message passing updates in linear time. However, the major challenge is to define proper reparametrizations updates on graphs with cycles. This is due to the following reasons
- The geometry of the underlying parameter space directly depends on the complex shape of the underlying marginal polytop, i.e. in the presence of cycles this is a highly contrained set.
- Each reparametrization must index the same joint probability distribution, i.e. wee need access the global properties of the underlying distribution.
In turns out, that for a particular class of exponential families, the gloabl shape of distribution in is reconstructed from a family of much simpler structures (chains, trees, hypertrees, etc.) which also yields approximations and error bounds to the marginal polytope in
Inference Problems on Tree-Graphs and Bethe Approximation
We first consider a generall parametrization of exponential family (3)
which arises from (3) for the particular choice of sufficient statistics given by overcomplete representation
which is also commonly denoted as nonidentifable due to the existance of affine subpace such that indexis the same distribution.
and introduce the set of all realizable marginals
The above set is denoted as the marginal polytop and can equivalently be written as a convex hull over vector corresponding to each configuration or equivalently as an intersection over an exponential number of half spaces for each configuration.
If the graph is a tree (no cycles) it can be shown that the marginal polytope admits a much simpler characterization in terms of locally consistent marginal distributions which satisfy the pairwise marginalization constraints
An important observation is that if the graph is a tree than the two sets agree and in the presence of cycles , i.e. it always serves as an upperbound to the complex marginal polytope.
In order to see the difference between the two sets let us consider a classical example of a one cycle graph . For each parameter we consider variational marginals
which satisfy marginalization conditions and therefore . In order to seek for the corresponding joint distrubution on graph with such marginals we check all the marginalization constraints while keeping the node marginals fixed.
which after small algebraic calculation can be reduced to the following set of inequalities
Returning to our example the above set of inequalities boilds down to
which illustrates how the marginal polytope is contained within the 3D cube . Now after plugging in the expression for the local polytope it is easely seen that
which shows that for there exists no probability distribution on with the marginal given by (add number ).
Bethe Entropy Approximation
Not only the marginal polytope on tree-structured graphs posesses a particular simple structure but also the Bethe entropy admits a closed form expression
with
Replacing the edge set with arbitrary edge sets correspodning to graphs with cycles yields the Bethe entropy approximation
The corresponding approximation to he log partition function is then given (add link) by the Bethe variational problem


Bethe Loop Series Expansion Formula
degree of each vertex
A generalized loop on is any subgraph such for all it holds

Given some stationary points of pseudomarginals after runnign message passing algorithm which minize the Bethe variational problem we obtain the following error expression w
where
denotes the edge weights associated to set of pseudomarginals . In view of the above error bound we can make the following observations
- The error in the Bethe approximation is influenced by generalized loops in the graph, particularly those subgraphs where every vertex has a degree greater than one. The contribution of each loop to the error is weighted by which depends on the pseudomarginal estimates.
- For tree-structured graphs (where loops are absent), the error term vanishes, and the Bethe approximation becomes exact.
This result provides a systematic understanding of the accuracy of the Bethe entropy approximation and its dependence on the underlying graph structure.