- Published at
Concepts of Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models and Factorizations
- Authors
- Name
- Dr. Dmitrij Sitenko
- Mathematical Image Processing at Ruprecht Karls University Heidelberg
Table of Contents
Why Graphical Models?
Imagine you are given a dataset , consisting of squared images, where is the number of pixels in each image . Here, indicates the corresponding image class, i.e., the ground truth classification of the underlying data point , such as the digit label in the MNIST dataset:
The holy grail is to obtain the joint probability distribution underlying the data generation process. For a binary dataset of size , this boils down to an inference task with parameters, which is computationally infeasible. However, if we have access to:
- The statistical properties of the underlying distribution, and
- The geometry of the space where the data points reside,
we can incorporate this knowledge into our inference process. The key idea of graphical models is to factorize over the underlying graph , where are the nodes and are the edges. Each node models a random variable , which interacts with its neighboring nodes via edges .
In general, by choosing an ordering of nodes on a graph, the distribution can be factorized using conditional distributions via an autoregressive factorization:
Based on how the joint distribution is factorized, the following common classes of graphical models arise:
Directed Graphical Models
In Directed Graphical Models, also known as Bayesian Networks or Markov Random Fields (MRFs), the edges are directed, meaning but . For each vertex , we define:
- Parents of node : This is the set , which consists of all nodes such that there is a directed path from .
- A directed graph is a Directed Acyclic Graph (DAG) if there are no cycles (i.e., no paths of the form ).
- If is a DAG, we can specify a partial order on the node set , denoting if , meaning is an ancestor of . Given this ordering, the autoregressive factorization in equation (1) can be further factorized over the sets of all ancestors via:
Undirected Graphical Models
In Undirected Graphical Models, where both and , the factorization in equation (2) does not hold because there is no ancestor-like partial ordering. Instead, the graph is decomposed into cliques , which are subsets of fully connected nodes (i.e., for all ).
Each clique is associated with a compatibility function , leading to the factorization:
In practice, cliques are chosen as maximal fully connected subsets for convenience.
Factor Graphs
For complex graphs with high inter-node dependencies, determining compatibility functions that factorize as in equation (3) can be challenging. To address this, Factor Graphs augment the node set with factor nodes , which represent cliques of the graph. The edge set is also augmented with edges whenever .
The resulting graph is bipartite (i.e., the node set is separable into two disjoint sets), yielding the factorization:
For example, if the graph is a tree, the compatibility functions on the factor nodes and variable nodes are given in terms of the marginals (for more details, refer to the blog post).
Inference Problems
Given a probabilistic graphical model on a graph , inference typically involves two main tasks:
- Marginal computation: Summing over an exponential number of configurations.
- Finding modes: Identifying the configuration with the highest probability, i.e.,
which is generally an NP-hard integer programming problem. For the particular case of acyclic graphs with no cycles, both of this tasks can be solved via dynamical programming paradigm.