Astro Starter Blog
Published at

Local Features for Medical Applications

Lifting of raw data to higher dimensional feature space while maintaining localily.

Authors
  • avatar
    Name
    Dr. Dmitrij Sitenko
  • Mathematical Image Processing at Ruprecht Karls University Heidelberg
Table of Contents

The Manifold of Symmetric Positive Definite Matrices Pd\mathcal{P}_{d}

For a given positive integer dN+d \in \mathbb{N}_+ we next carry over the differential geometric objects to the specific open set of symmetric positive definite d×dd \times d matrices that play an important role in mathematics, physics, numerical analysis and imaging that is given by the linear subspace of Rd×d\mathbb{R}^{d\times d}

Pd={SRd×d ⁣:S=ST,S  is positive definite}.\mathcal{P}_{d} = \{S\in\mathbb{R}^{d\times d}\colon S=S^{T},\, S\;\text{is positive definite}\}.

The tangent space at SPdS\in\mathcal{P}_d consists of symmetric matrices

TSPd={SRd×d ⁣:ST=S}.T_{S}\mathcal{P}_{d} = \{S \in \mathbb{R}^{d\times d}\colon S^{T}=S\}.

As the set Pd\mathcal{P}_d is embedded in the Euclidean space Rc×c\mathbb{R}^{c \times c} the most
simple metric on TSPdT_{S}\mathcal{P}_{d} is naturally given in terms of the Frobenius norm

dF(U,V)=UVF,VF=tr(VTV)12U,VTSPd. d_F(U,V) = \|U-V\|_F, \quad \| V \|_F = \text{tr}(V^TV)^{\frac{1}{2}} \quad U,V \in T_{S}\mathcal{P}_{d}.

We will use Pd\mathcal{P}_d as a building backbone for prototype extraction in connection with the labeling task More specifically, we adopt various geometries on Pd\mathcal{P}_d by utilizing
alternative metrics and clarify their influence on the resulting mean properties defined by optimization problem along with main computational tools for mean approximation.

Affine Invariant Riemannian Metric

Geometrically the set of symmetric positive define matrices
comprises the interior of a convex cone in the Euclidean space Rd×d\mathbb{R}^{d\times d} with zero curvature. Therefore, one major limitation of metric is the finite distance to points on the boundary, i.e. matrices with zero eigenvalues. Instead, equipping the tangent space TSPdT_S\mathcal{P}_d with the Riemannian metric.

gS(U,V)=tr(S1US1V),U,VTSPd, g_{S}(U,V) = \text{tr}(S^{-1} U S^{-1} V),\qquad U,V\in T_{S}\mathcal{P}_{d},

commonly known as the affine invariant Riemannian metric alleviates the problem and turns the set Pd\mathcal{P}_d into a negatively curved
Riemannian manifold with a natural Riemannian distance

dPd(S,T)=(i[d](logλi(S,T))2)1/2, d_{\mathcal{P}_{d}}(S,T) =\Big(\sum_{i \in [d]}\big(\log\lambda_{i}(S,T)\big)^{2}\Big)^{1/2},

where λi(S,T)\lambda_i(S,T) are the generalized eigenvalues of the pencil (S,T)(S,T) that is

Tv=λi(S,T)Sv,for somevRdwithv=1. Tv = \lambda_i(S,T) S v, \quad \text{for some} \quad v \in \mathbb{R}^d \quad \text{with} \quad \|v\| = 1.

For this particular case the exponential map reads

expS(U)=S12expm(S12US12)S12,expm(V)=i=0Vii!,VRd×d \exp_{S}(U) = S^{\frac{1}{2}}\text{expm}(S^{-\frac{1}{2}}US^{-\frac{1}{2}})S^{\frac{1}{2}}, \qquad \text{expm}(V) = \sum_{i = 0}^{\infty}\frac{V^i}{i!}, \quad V \in \mathbb{R}^{d\times d}

In the map expm()\text{expm}(\cdot) is denoted as the matrix exponential with the inverse given by matrix logarithm logm=expm1\text{logm}=\text{expm}^{-1}

logm(S)=i=1(1)i1i(SI)i.\text{logm}(S) = \sum_{i = 1}^{\infty}\frac{(-1)^{i-1}}{i}(S-I)^i.

Finally, given a smooth objective function J ⁣:PdRJ \colon \mathcal{P}_{d} \to \mathbb{R}, the Riemannian gradient is given by

J(S)=S(J(S))STSPd, \nabla J(S) = S\big(\partial J(S)\big)S \in T_{S}\mathcal{P}_{d},

where the symmetric matrix J(S)\partial J(S) denotes the Euclidean gradient of JJ at SS. Since Pd\mathcal{P}_{d} is a simply connected, complete and nonpositively curved Riemannian manifold, the exponential map is globally defined and bijective, and the Riemannian mean always exists and is uniquely defined as minimizer of the objective function , after substituting the Riemannian distance .

Given a set of positive definite matrices

SN={(S1,ω1),,(SN,ωN)}Pd \mathcal{S}_{N} = \{(S_1,\omega_{1}),\dots , (S_{N},\omega_{N})\} \subset \mathcal{P}_d

together with positive weights ωi\omega_{i}, we next focus on the solution of the
problem for specific geometry ,

S=argminSPdJ(S;SN),J(S;SN)=i[N]ωidPd2(S,Si), \overline{S} = \arg\min_{S \in \mathcal{P}_{d}} J(S;\mathcal{S}_{N}),\qquad J(S;\mathcal{S}_{N}) = \sum_{i \in [N]} \omega_{i}d_{\mathcal{P}_{d}}^{2}(S,S_{i}),

with the distance dPdd_{\mathcal{P}_{d}} given by . Due to invertibility of
, each tangent vector UTSPdU \in T_S \mathcal{P}_d is represented at the base point SS by

U=expS1expS(U)=S12logm(S12expS(U)S12)S12. U = \exp_{S}^{-1} \circ \exp_{S}(U) = S^{\frac{1}{2}} \text{logm}\big(S^{-\frac{1}{2}}\exp_{S}(U)S^{-\frac{1}{2}}\big) S^{\frac{1}{2}}.

As a result, optimality condition reads

i[N]ωiS12logm(S12SiS12)S12=0. \sum_{i \in [N]} \omega_{i} \overline{S}^{\frac{1}{2}} \text{logm}\big(\overline{S}^{-\frac{1}{2}}S_{i} \overline{S}^{-\frac{1}{2}}\big) \overline{S}^{\frac{1}{2}} = 0.

Apart from the specific case S2=2|\mathcal{S}_2| = 2 in
where the Riemannian mean is directly given by S=S1(S11S2)12\overline{S} = S_1(S_1^{-1}S_2)^{\frac{1}{2}}, the solution to nonlinear equation

is not explicitly known. A remedy is the corresponding basic fixed iteration initialized at S0PdS_0 \in \mathcal{P}_d

St+1=Stexpm(hti=[N]ωilogm(Si1St)), S_{t+1} = S_t\text{expm}(-h_t\sum_{i = [N]}\omega_i\text{logm}(S_i^{-1}S_t)),

where the step size ht>0h_t> 0 can be defined heuristically or by following more sophisticated
line search strategies, see and . However, as pointed out in applying

is limited in the sense that convergence is not theoretically guaranteed and if the iteration converges, than at a linear rate only. This can be remedied by restricting the set of matrices {S1,,SN}\{S_1,\dots,S_N \} in to pairwise commute resulting in the following variant of Riemannian mean recovery proposed by
that comes with guarantees to converge at a quadratic rate. Using the parametrization by means corresponding to the Cholesky decomposition

S=LLT S = L L^{T}

along with replacing the map of fixed point iteration with its linearization leads to the following fixed point iteration

Fτ(L;SN)=LLTτi[N]ωiLTlogm(LTSi1L1)L, F_{\tau}(L; \mathcal{S}_{N}) = L L^{T}- \tau \sum_{i\in [N]} \omega_{i} L^{T} \text{logm}(L^{-T}S_i^{-1}L^{-1})L,

with damping parameter τ>0\tau > 0. Comparing to shows that the basic idea is to compute the Riemannian mean S\overline{S} as fixed point of the iteration

S=limtS(t),S(t+1)=F(S(t);SN). \overline{S} = \lim_{t \to \infty} S_{(t)},\qquad S_{(t+1)} = F(S_{(t)};\mathcal{S}_{N}).

Algorithm provides a refined variant of this iteration including adaptive stepsize selection.

Fixed Point Iteration for Computing the Riemannian Matrix Mean

Initialization

  • ϵ\epsilon (termination threshold)

  • t=0t = 0, S(0)=LLTS_{(0)} = L L^{T}, with S(0)S_{(0)} solving.

  • c0=λmax(S(0))λmin(S(0))c_0 = \frac{\lambda_{\max}(S_{(0)})}{\lambda_{\min}(S_{(0)})},

  • {α0,β0}=[log(c0)c01,c0log(c0)c01]\{\alpha_0, \beta_0\} = \big[\frac{\log(c_0)}{c_0-1}, c_0 \frac{\log(c_0)}{c_0-1}\big] (condition number and step size selection parameters)

  • τ0=2α0+β0\tau_0 = \frac{2}{\alpha_0 + \beta_0}

  • S(1)=Fτ(L;SN)S_{(1)} = F_{\tau}(L; \mathcal{S}_{N}) (iterative step)

  • ϵ1=i[N]ωilogm(S(1)12Si1S(1)12)F,t=1\epsilon_{1} = \Big\|\sum_{i \in [N]} \omega_i \text{logm}\Big(S_{(1)}^{\frac{1}{2}} S^{-1}_i S_{(1)}^{\frac{1}{2}}\Big)\Big\|_{F}, \quad t = 1 While ϵt>ϵ\epsilon_{t} > \epsilon:

  1. S(t)=LLTS_{(t)} = L L^{T}
  2. ct=λmax(S(t))λmin(S(t))c_{t} = \frac{\lambda_{\max}(S_{(t)})}{\lambda_{\min}(S_{(t)})}
  3. If ct=1c_{t} = 1, stop
  4. {αt,βt}={k=0tlog(ck)ck1,cklog(ck)ck1}\{\alpha_{t}, \beta_{t}\} = \Big\{\sum_{k=0}^{t}\frac{\log(c_k)}{c_k-1}, c_k \frac{\log(c_k)}{c_k-1}\Big\}
  5. τt=2αt+βt\tau_{t} = \frac{2}{\alpha_{t} + \beta_{t}}
  6. S(t+1)=Fτt(L;SN)S_{(t+1)} = F_{\tau_{t}}(L; \mathcal{S}_{N})
  7. ϵt+1:=i[N]ωilogm(S(t+1)12Si1S(t+1)12)F,tt+1\epsilon_{t+1} := \Big\| \sum_{i \in [N]} \omega_i \text{logm}\Big(S_{(t+1)}^{\frac{1}{2}} S^{-1}_i S_{(t+1)}^{\frac{1}{2}}\Big)\Big\|_{F}, \quad t \leftarrow t + 1

Log-Euclidean Metric

Leveraging the specific properties of the mappings and an alternative metric was proposed by (among several other ones). More precisely, introducing the operations

prescribes on Pd\mathcal{P}_d a Lie group structure where set (Ps,,)(\mathcal{P}_{s},\odot,\cdot) is isomorphic to the vector space with \odot playing the role of addition. The Log-Euclidean metric for two symmetric positive definite matrices S1,S2PdS_1,S_2 \in \mathcal{P}_d is then defined by
the Euclidean inner product on the flat Riemannian space of zero curvature after applying the mapping logm:PdTSPd\text{logm}:\mathcal{P}_d\to T_S \mathcal{P}_d, (cf. ) and provides a lower bound to the (AIRM) metric via

dle(S1,S2)=logm(S1)logm(S2)F,S1,S2Pd. d_{le}(S_1,S_2) = \| \text{logm}(S_1)-\text{logm}(S_2) \|_F, \qquad S_1,S_2 \in \mathcal{P}_d.

The corresponding Riemannian mean of the set SN\mathcal{S}_{N} has the closed form expression

S=expm(i[N]wilogm(Si)), \overline{S} = \text{expm}\Big(\sum_{i\in [N]} w_{i}\text{logm}(S_{i})\Big),

which has analogous form as the arithmetic mean. While computing the mean is considerably cheaper than integrating the flow using approximation Algorithm, the critical drawback of relying on is not taking into account the structure (curvature) of the manifold Pd\mathcal{P}_{d}. Therefore, in the next section, we additionally consider another approximation of the Riemannian mean that better fits the underlying geometry and avoids the expensive computation of generalized eigenvalues resulting in a more efficient mean evaluation than the Riemannian mean associated to metric.

SS-Divergence

In this section we make use of divergence functions introduced in Section and present a general approach of approximating the objective function by instead replacing the intractable problem specific squared Riemannian distance by a divergence function

D(p,q)12dg2(p,q), D(p,q) \approx \frac{1}{2} d_{g}^{2}(p,q),

Property says that, for any feasible pp, the Hessian with respect to the first argument is positive definite. In fact, suitable divergence functions DD recover in this way locally the metric tensor of the underlying manifold M\mathcal{M}, in order to qualify as a surrogate for the squared Riemannian distance .

Properties of mean G(SN)\mathcal{G}(S_N)Riemann.Log-Eucl.Stein div.
Invariance by reordering of SNS_N
Congruence invariance
SNASNATS_N \to AS_NA^T for AGL(d)A \in GL(d)✔, ASO(d)A \in SO(d)
Self-duality (G(SN))1=G((SN1))(\mathcal{G}(S_N))^{-1} = \mathcal{G}((S^{-1}_N))
Joint homogeneity
G(α1S1,,αNSN)=(Πi[N]αi)1NG(SN)\mathcal{G}(\alpha_1S_1,\dots,\alpha_NS_N) = (\underset{i \in [N]}{\Pi}\alpha_i)^{\frac{1}{N}}\mathcal{G}(S_N)
Determinant identity
detG(SN)=(Πi[N]detSi)1N\det \mathcal{G}(S_N) = (\underset{i \in [N]}{\Pi} \det S_i)^{\frac{1}{N}}
If SiSNS_i \in S_N commute pairwise then:
G(SN)=(Πi[N]Si)1N\mathcal{G}(S_N) = (\underset{i \in [N]}{\Pi} S_i)^{\frac{1}{N}}

For the present case M=Pd\mathcal{M}=\mathcal{P}_{d} of interest, proposed the divergence function, called Stein divergence. For S,S1,S2PdS, S_{1},S_{2} \in \mathcal{P}_{d} it is defined as

Ds(S1,S2)=logdet(S1+S22)12logdet(S1S2), D_{s}(S_{1},S_{2}) = \log\det\Big(\frac{S_{1}+S_{2}}{2}\Big) - \frac{1}{2}\log\det (S_{1} S_{2}),

which is not a metric on Pd\mathcal{P}_d due to the lack of triangle inequality. However, as shown by taking the square root in yield a metric dS(,)d_{S}(\cdot,\cdot) on Pd\mathcal{P}_d. Thus replacing the Riemannian distance in the second term of problem the Riemannian distance by dSd_{S} provides an approximate objective

S=argminSPdJs(S;SN),Js(S;SN)=i[N]wiDs(S,Si), \overline{S} = \arg\min_{S\in\mathcal{P}_{d}} J_{s}(S;\mathcal{S}_{N}),\qquad J_{s}(S;\mathcal{S}_{N}) = \sum_{i\in [N]} w_{i} D_{s}(S,S_{i}),

which allows mean recovery while avoiding to solve the numerically involved generalized eigenvalue problem as opposed to . The resulting Riemannian gradient flow reads

with

R(S;SN)=i[N]wi(S+Si2)1. R(S;\mathcal{S}_{N}) = \sum_{i\in [N]} w_{i}\Big(\frac{S+S_{i}}{2}\Big)^{-1}.

Discretizing the flow using the geometric explicit Euler scheme with step size hh,

and using the log-Euclidean mean as initial point S(0)S_{(0)}, defines Algorithm as listed below.

Initialization

ϵ\epsilon (termination threshold)


t=0,S(0)t = 0, \quad S_{(0)} solves


ϵ0>ϵ\epsilon_{0} > \epsilon (any value ϵ0\epsilon_{0})


While ϵt>ϵ\epsilon_{t} > \epsilon:

  1. LLT=S(t)L L^{T} = S_{(t)}

  2. LiLiT=S(t)+Si2L_{i} L_{i}^{T} = \frac{S_{(t)} + S_{i}}{2} for i[N]i \in [N]

  3. U=IS(t)12(i[N]ωi(LiLiT)1)S(t)12U = I - S_{(t)}^{\frac{1}{2}}\left(\sum_{i\in [N]} \omega_{i}(L_{i}L_{i}^{T})^{-1}\right) S_{(t)}^{\frac{1}{2}}

  4. S(t+1)=S(t)12expm(h2U)S(t)12S_{(t+1)} = S_{(t)}^{\frac{1}{2}}\text{expm}\left(\frac{h}{2} U\right) S_{(t)}^{\frac{1}{2}}

  5. ϵt+1:=UF,tt+1\epsilon_{t+1} := \|U\|_{F}, \quad t \leftarrow t + 1

Computing the Geometric Matrix Mean Based on the SS-divergence.

The key advantage of Algorithm over Algorithm is that it respects the geometry of Pd\mathcal{P}_d while remaining numerically efficient. In Section~ we will provide an experimental evaluation and a qualitative comparison of the mean properties with respect to the aforementioned metrics
while extracting prototypes for the labeling task. Application details are reported in Chapter . We conclude by listing the main properties of the resulting means when relying on the Riemannian, log-Euclidean and Stein divergence summarized in Table.