Astro Starter Blog
Published at

Mathematical Perspective on Variational Auto Encoders

Explaning key building blocks and main mathematical ideas behind the variational Auto Encoders

Authors
  • avatar
    Name
    Dr. Dmitrij Sitenko
  • Mathematical Image Processing at Ruprecht Karls University Heidelberg
Table of Contents
The Auto-Encoder-Decoder

Variational Auto-Encoders (VAEs)

Vartiational Autoencoders become popular with the work of Kingma and Welling (2013). The primary goal was the inference task of latent distribution pθ(z)p_{\theta}(z)
over small instrictic dimension dim(z)\dim(z) thereby reducing the large dimensialel data space dimX\dim X.

Chalenges:

  • Evaluating the model evidence pθ(x)=Zpθ(x,z)dzp_{\theta}(x) = \int_Z p_{\theta}(x,z)dz is a hard task due to the involved integral. As pθ(x,z)p_{\theta}(x,z) can be easy evaluated this implies that the conditional distribution pθ(zx)=pθ(x,z)pθ(x)p_{\theta}(z|x) = \frac{p_{\theta}(x,z)}{p_{\theta}(x)} can not be accessed.

Variational Autoencoders assumes builds upon two variational distributions pθ(x,z)p_{\theta}(x,z) and pϕ(x,z)p_{\phi}(x,z) which conditioned on the input random variable are denoted as the decoder and the encoder respectivaly.
Hereby, common choices of the decoder pθ(zx)p_{\theta}(z|x) and the encoder qϕ(zx)=N(z;ϕ)q_{\phi}(z|x) = \mathcal{N}(z;\phi) are assumed to be gaussian distribution whose mean and covariances are the deep paramterized by an neural network.

Evidence Lower Bound

logpθ(x)=Eqϕ(zx)[logpθ(x)]=Eqϕ(zx)[logpθ(x,z)pθ(zx)]=Eqϕ(zx)[logqϕ(zx)pθ(x,z)pθ(zx)qϕ(zx)]=Eqϕ(zx)[logqϕ(zx)pθ(zx)]+Eqϕ(zx)[logpθ(x,z)qϕ(zx)]=DKL(qϕ(zx):pθ(zx))+Eqϕ(zx)[logpθ(x,z)qϕ(zx)]Eqϕ(zx)[logpθ(x,z)qϕ(zx)](ELBO)\log p_{\theta}({\bf x})=\mathbb{E}_{q_{\phi}({\bf z|x})}\left[\log p_{\theta}({\bf x})\right] = \mathbb{E}_{q_{\phi}({\bf z|x})}\left[\log \frac{p_{\theta}(x,z)}{p_{\theta}(z|x)}\right] = \mathbb{E}_{q_{\phi}({\bf z|x})}\left[\log \frac{\textcolor{green}{q_{\phi}(z|x)}p_{\theta}(x,z)}{p_{\theta}(z|x)\textcolor{green}{q_{\phi}(z|x)}}\right] \\\hskip -0.7cm = \mathbb{E}_{q_{\phi}({\bf z|x})}\left[\log \frac{\textcolor{green}{q_{\phi}(z|x)}}{p_{\theta}(z|x)}\right]+\mathbb{E}_{q_{\phi}({\bf z|x})}\left[ \log \frac{p_{\theta}(x,z)}{\textcolor{green}{q_{\phi}(z|x)}}\right] \\ = D_{KL}(q_{\phi}(z|x):p_{\theta}(z|x))+\mathbb{E}_{q_{\phi}({\bf z|x})}\left[ \log \frac{p_{\theta}(x,z)}{\textcolor{green}{q_{\phi}(z|x)}}\right]\\ \hskip -1.7cm \geq \mathbb{E}_{q_{\phi}({\bf z|x})}\left[ \log \frac{p_{\theta}(x,z)}{\textcolor{green}{q_{\phi}(z|x)}}\right] \qquad (ELBO)

The lower bound to the log-likelihood after skipping the nonnegative Kullback Leibner divergence is denoted as the Evidence-Lower-Bound Objective (ELBO). Therefore by maximizing the (ELBO) we follow the hope that corresponding maximimzer θ,μ\theta^{\ast},\mu^{\ast} of the Encoder Decoder part yield a maximmal log probability distribution pθ(x)p_{\theta}(x).
Lets summerize our goal which potentially aim to achieve to get an optimal variational auto encoder:

  • (1) We want an approximation to the truth data generating distribution,
p(x)p^(x)=Zpθ(x,z)dzp^{\ast}(x) \approx \hat{p}(x) = \int_Z p_{\theta}(x,z)dz

which also highlights the burden to evaluate the conditional distribution over the latent space corresponding to the decoder pθ(zx)p_{\theta}(z|x) which by Bayes rule requires knowledge on p(x)p^{\ast}(x) which is unknown.

by marginalizationg of the decoder.

  • (2) By contruction the encoder qϕ(x,z)q_{\phi}(x,z) and the decoder pθ(x,z)p_{\theta}(x,z) must satisfy the self-consitancy relation at the latent marginals zZz\in Z, i.e.
Xpθ(x,z)dx=q(z)=Xqϕ(x,z)dz\int_X p_{\theta}(x,z)dx = q(z) = \int_X q_{\phi}(x,z)dz

The distribution q(z)q(z) over the latent space is commonly chosen to be a simple distribution to allow efficient sampling from the decoder pθ(xz)p_{\theta}(x|z) under assumption on tractable form of pθ(x,z)p_{\theta}(x,z)

  • (3) Zero gap which makes the ELBO lower tight that is
DKL(pθ(zx)qϕ(zx))0D_{KL}(p_{\theta}(z|x)||q_{\phi}(z|x) ) \approx 0

Let us consider how all this goals are adressed by minimizer of the ELBO objective which can be equivalently written as

Eqϕ(zx)[logpθ(x,z)qϕ(zx)]=Eqϕ(zx)[log(pθ(x,z)q(z)qϕ(zx)()q(z))]=Eqϕ(zx)[logq(z)qϕ(zx)]=Data CompressionEqϕ(zx)[logpθ(xz)]=Recontruction \mathbb{E}_{q_{\phi}({\bf z|x})}\left[ \log \frac{p_{\theta}(x,z)}{q_{\phi}(z|x)}\right] = \mathbb{E}_{q_{\phi}(z|x)}\left[\log(\frac{p_{\theta}(x,z)\textcolor{green}{q(z)}}{q_{\phi}(z|x)()\textcolor{green}{q(z)}})\right] = \underbrace{\mathbb{E}_{q_{\phi}(z|x)}\left[ \log \frac{q(z)}{q_{\phi}(z|x)} \right]}_{= \textcolor{red}{\text{Data Compression}}}-\underbrace{\mathbb{E}_{q_{\phi}(z|x)}\left[ \log p_{\theta}(x|z) \right]}_{= \textcolor{red}{\text{Recontruction}}}

The result of this simple decomposition of the ELBO highlights maximization simultaniosly measures how well the encoder distribution confirms to the desired simple marginal q(z)q(z) by maximizing the first term which is responsible for data compression. Moreevoer maximal values of the ELBO enforce the second term to be close to zero,
which measures the generative accuracy of the data point xx passing through the encoder q(zx)q(z|x) to yield high probability samples which are close to the original signal (data) p(x)p^{\ast}(x).

Gaussian VAEs

Lets consider a simpel scenario of an VAE where the encoder and the decoder are given by Gaussian distributions

pθ(xz)=N(zμ(θ(z)),Σ(θ(z))),qϕ(zx)=N(zμ(ϕ(x)),Σ(ϕ(x)))p_{\theta}(x|z) = \mathcal{N}(z|\mu(\theta(z)),\Sigma(\theta(z))), \qquad q_{\phi}(z|x) = \mathcal{N}(z|\mu(\phi(x)),\Sigma(\phi(x)))

where the mean and covariances are now paramterized by a neural network, for example the mean as a vector and diagonal of the covariances. Taking the prior latent distribution q(z)=N(0,I)q(z) = \mathcal{N}(0,I) the evaluation of the (ELBO) is approximately achieved by\

  • Sampling over the encoder distribution which is Gaussian and therefore tractable
  • Evaluating the brackets within the expectation which are given in closed form solution
logpθ(xz)=12(xμ(θ(z)))TΣ(θ(z))1(xμ(θ(z)))+12log((2π)dΣ(θ(z))) \log p_{\theta}(x|z) = -{\frac{1}{2}}(\mathbf{x}-{\boldsymbol{\mu(\theta(z))}})^{T}{\boldsymbol{\Sigma(\theta(z))}}^{-1}(\mathbf{x}-{\boldsymbol{\mu(\theta(z))}}) +\frac{1}{2}\log({(2\pi)^{d}|{\boldsymbol{\Sigma(\theta(z))}}|})

and

log(qϕ(zx)q(z))=12(zμ(ϕ(x)))TΣ(ϕ(x))1(zμ(ϕ(x)))12log((2π)dΣ(ϕ(x)))12z2d2log(2π) -\log\left(\frac{q_{\phi}(z|x)}{q(z)}\right) = {\frac{1}{2}}(\mathbf{z}-{\boldsymbol{\mu(\phi(x))}})^{T}{\boldsymbol{\Sigma(\phi(x))}}^{-1}(\mathbf{z}-{\boldsymbol{\mu(\phi(x))}}) - \frac{1}{2}\log({(2\pi)^{d}|{\boldsymbol{\Sigma(\phi(x))}}|}) -\frac{1}{2}||z||^2 - \frac{d}{2}\log(2\pi)

Thus we get

LELBO(x,ϕ,θ)=Eqϕ(zx)[logpθ(xz)]+12(tr(Σϕ(x))+μϕ(x)Tμϕ(x)dlogΣϕ(x)) {\mathcal{L}}_{\mathrm{ELBO}}(x,\phi,\theta)=-\mathbb{E}_{q_{\phi}(z|x)}\left[\log p_{\theta}(x|z)\right]+{\frac{1}{2}}\left(\operatorname{tr}(\Sigma_{\phi}(x))+\mu_{\phi}(x)^{T}\mu_{\phi}(x)-d-\log\left|\Sigma_{\phi}(x)\right|\right)

where we put all the contants undependent of θ,ϕ\theta,\phi into the constant DD. Taking a minibatch BD\mathcal{B}\subset \mathcal{D} of the training the parameters are then optimized by batch gradient descent

1BxBLELBO(x,ϕ,θ)\frac{1}{|\mathcal{B}|}\sum_{x \in \mathcal{B}} {\mathcal{L}}_{\mathrm{ELBO}}(x,\phi,\theta)
The Auto-Encoder-Decoder

Forward Pass

Given a dataset of independently and identically distributed i.i.d.i.i.d. data points D={x(1),x(2),...,x(N)}{x(i)}i=1N\qquad \qquad \qquad \qquad \qquad \qquad \mathcal{D}=\{{\mathbf{x}}^{(1)},{\mathbf{x}}^{(2)},...,{\mathbf{x}}^{(N)}\}\equiv\{{\mathbf{x}}^{(i)}\}_{i=1}^{N}

the optimization of the encoder decoder boilds down to the following steps. First Picking a random sample xDx \in \mathcal{D} we need to pass the data through the encoder by sampling a latent variable zimN(μ(ϕ(x)),Σ(ϕ(x)))z im \mathcal{N}(\mu(\phi(x)),\Sigma(\phi(x))). This is
accomplished by taking a neural network and to pass the input data xx and output given by the mean and and covariance matrix μ(ϕ(x)),Σ(ϕ(x))=NNϕ(x)\mu(\phi(x)),\Sigma(\phi(x)) = NN_{\phi}(x). Subsequently, the sampling a latent variable zz is performed by applying tranformation mapping to gaussian white noise ϵN(0,I)\epsilon \sim \mathcal{N}(0,I) via

z=μ(ϕ(x))+Σ(ϕ(x))12ϵ z = \mu(\phi(x))+\Sigma(\phi(x))^{\frac{1}{2}}\epsilon

where the square root of the covariance matrix is obtain by applying Cholesky decomposition.

The VAE Forward Pass

Following similar steps the sampled zRdzz \in \mathbb{R}^{d_z} is passed as the input to the second neural network which parametrized the mean and covariance of the decoder distribution x^N(μ(θ(z)),Σ(θ(z)))\hat{x} \sim \mathcal{N}(\mu(\theta(z)),\Sigma(\theta(z))) with μ(θ(z)),Σ(θ(z))=NNθ(z)\mu(\theta(z)),\Sigma(\theta(z)) = NN_{\theta}(z)

The Auto-Encoder-Decoder

Problem with naive Gradient Descent

Having performed the forward pass we need to evolve the variational paramters ϕ,θ\phi,\theta towards a local mimimum of the ELBO objective via gradient descent where the gradient with respect to ϕ\phi and θ\theta are respectivaly given by

θLθ,ϕ(x)=θEqϕ(zx)[logpθ(x,z)logqϕ(zx)]=Eqϕ(zx)[θ(logpθ(x,z)logqϕ(zx))]\nabla_{\theta}\mathcal{L}_{\theta,\phi}(\mathbf{x})=\nabla_{\theta}\mathbb{E}_{q_{\phi}(\mathbf{z}|\mathbf{x})}\left[\log p_{\theta}(\mathbf{x},\mathbf{z})-\log q_{\phi}(\mathbf{z}|\mathbf{x})\right] = \mathbb{E}_{q_{\phi}({\bf z}|{\bf x})}\left[\nabla_{\theta}(\log p_{\theta}({\bf x},{\bf z})-\log q_{\phi}({\bf z}|{\bf x}))\right]

which can be apprxomited by a one sample Monte Carlo estimator

θ(logpθ(x,z)logqϕ(zx))=θ(logpθ(x,z))\approx \,\nabla\theta(\log p_{\theta}({\bf x},{\bf z})-\log q_{\phi}({\bf z}|{\bf x})) =\,\nabla_{\theta}(\log p_{\theta}({\bf x},{\bf z}))

As can be observed from the above formula, despite a simple Monte Carlo approximation there are no difficulties in accessing the gradient for the backward propagation of variational paramters θ\theta. However, this is not the case for the gradient with respect to the encoder

ϕLθ,ϕ(x)=ϕEqϕ(xx)[logpθ(x,z)logqϕ(zx)]Eqϕ(zx)[ϕ(logpθ(x,z)logqϕ(zx))]\nabla_{\phi}\mathcal{L}_{\theta,\phi}(\mathbf{x})=\nabla_{\phi}\mathbb{E}_{q_{\phi}(\mathbf{x}|\mathbf{x})}\left[\log p_{\theta}(\mathbf{x},\mathbf{z})-\log q_{\phi}(\mathbf{z}|\mathbf{x})\right] \neq\mathbb{E}_{q_{\phi}({\bf z}|\mathbf{x})}\left[\nabla_{\phi}(\log p_{\theta}(\mathbf{x},\mathbf{z})-\log q_{\phi}(\mathbf{z}|\mathbf{x}))\right]

which is due to the neccecity to differentiate the expectation over qϕ(zx)q_{\phi}(z|x) which is paramterized by ϕ\phi!. To tackle this problems we need to introduce the reparametrization trick.

Reparametrization Trick and Changes of Variables Formula

Concerning the problem of evaluating expression (todo) Rezende et al tackled the problem via the change of variables formula for continuous distributions

fX(x)dx=fY(Ψ1(x))det(JxΨ1(x))dxf_X(x)dx = f_Y(\Psi^{-1}(x))\det(J_x\Psi^{-1}(x))dx

where fXf_X and fYf_Y denote probability densitity distributions respectivaly. To see why the above formula holds let us do some simple calculations given an invertible map Ψ:YX\Psi:Y \to X with det(Ψ)>0\det(\Psi)>0. Because Ψ\Psi maps random variables yYy \in Y one to one to some Ψ(y)X\Psi(y) \in X we first reexrpress the density fXf_X to fYf_Y using abbreviation y(x)=Ψ1(x)y(x) = \Psi^{-1}(x)

SXfX(x)dx=PX(xSX)=PX(xΨ(SY))=PY(y(x)SY)=SY=Ψ1(SX)fY(y)dy\int_{S_X} f_X(x) dx = P_X(x \in S_X) = P_X(x \in \Psi(S_Y)) = P_Y(y(x) \in S_Y) = \int_{S_Y = \Psi^{-1}(S_X)} f_Y(y)dy

where the fourth inequality holds due to positive determinant of the map Ψ(y)\Psi(y). Finally, applying the classical change of variables formula for intergrating multivariate functions with respect to substitution y=Ψ1(x)y = \Psi^{-1}(x)

Ψ1(SX)fY(y)dy=Ψ1(SX)fY(Ψ1Ψ(y))dy=SXfY(Ψ1(x))det(JxΨ1(x))dx\int_{\Psi^{-1}(S_X)} f_Y(y)dy = \int_{\Psi^{-1}(S_X)}f_Y(\Psi^{-1}\circ \Psi(y)) dy = \int_{S_X} f_Y(\Psi^{-1}(x))\det(J_x\Psi^{-1}(x))dx

Consider the following simple 1D exmaple with Gaussian distributed random variable zN(z0,1)z \sim \mathcal{N}(z|0,1). Then the transformed term x=exp(z)x = \exp(z) is again a randmom variable with nonnegative derivative and using change of variable formula we know the density fXf_X of xx

fX(x)dx=N(log(x)0,1)1xdxf_X(x)dx = \mathcal{N}(\log(x)|0,1)\cdot \frac{1}{x}dx

as can be observed by the following figure

Illustration of Change of Variables Formula


Using the change of variables we can reparametrize the expectation of zz in terms of standart Gaussian random variable ϵ=N(0,I)\epsilon = \mathcal{N}(0,I) by looking for a mapping z=g(ϵ,ϕ,x)z = g(\epsilon,\phi,x) which yields with fx(z,θ,ϕ)=logpθ(x,z)logqϕ(zx)f_x(z,\theta,\phi) = \log p_{\theta}(\mathbf{x},\mathbf{z})-\log q_{\phi}(\mathbf{z}|\mathbf{x})

Eqϕ(zx)[fx(z)]=Zqϕ(zx)fx(z,θ,ϕ)dz=g1(Z)N(g1(z)0,I)fx(z,θ,ϕ)det(Jzg1(z))dz=EN(ϵ0,I)fx(g(ϵ))dϵ\mathbb{E}_{q_{\phi}(\mathbf{z}|\mathbf{x})}\left[f_x(z) \right] = \int_Z q_{\phi}(z|x)f_x(z,\theta,\phi)dz = \int_{g^{-1}(Z)} \mathcal{N}(g^{-1}(z)|0,I)f_x(z,\theta,\phi)\det(J_{z}g^{-1}(z))dz = \int_{\mathcal{E}}\mathcal{N}(\epsilon|0,I)f_x(g(\epsilon))d\epsilon

which equals Ep(ϵ)[f(z)]\mathbb{E}_{p(\epsilon)}\left[f({\bf z})\right]. The important observation of the above reparametrization trick comes from the changed expectation from zqϕ(zx)z \sim q_{\phi}(z|x) that depends on ϕ\phi to a simple standart Gaussian distribution N(ϵ0,I)\mathcal{N}(\epsilon|0,I). Moreover this yields an efficient evaluation of the gradient with respect to the problematic variational paramter ϕ\phi

ϕEqϕ(zx)[f(z)]=ϕEp(ϵ)[f(z)]=Ep(ϵ)[ϕf(z)]ϕf(z)\nabla_{\phi}\mathbb{E}_{q_{\phi}(\mathbf{z}|\mathbf{x})}\left[f(\mathbf{z})\right]=\nabla_{\phi}\mathbb{E}_{p(\epsilon)}\left[f(\mathbf{z})\right] = \mathbb{E}_{p(\epsilon)}\left[\nabla_{\phi}f(\mathbf{z})\right] \approx\nabla_{\phi}f(\mathbf{z})

In other word reparamterization trick makes it possible backpropage through the encoder part of the network as can be observed from the following figure

The Auto-Encoder-Decoder

For a one sample Monte Carlo estimator with ϵN(0,I)\epsilon \sim \mathcal{N}(0,I) and z=g(ϕ,x,ϵ)z = g(\phi,x,\epsilon) evaluation of the ELBO amount to compute

L^θ,ϕ,ϵ(x)=logpθ(x,z)logqϕ(zx)\hat{\mathcal{L}}_{\theta,\phi,\epsilon}({\bf x}) = \log p_{\theta}({\bf x},{\bf z})-\log q_{\phi}({\bf z}|{\bf x})

which yields an unbiased estimator as can be observed from

Ep(ϵ)[θ,ϕL^θ,ϕ,ϵ(x)]=Ep(ϵ)[θ,ϕ(logpθ(x,z)logqϕ(zx))]=θ,ϕ(Ep(ϵ)[logpθ(x,z)logqϕ(zx)])=θ,ϕLθ,ϕ(x)\mathbb{E}_{p(\epsilon)}\left[\nabla_{\theta,\phi}\hat{\mathcal{L}}_{\theta,\phi,\epsilon}({\bf x})\right]=\mathbb{E}_{p(\epsilon)}\left[\nabla_{\theta,\phi}(\log p\theta(\mathbf{x},\mathbf{z})-\log q_{\phi}(\mathbf{z}|\mathbf{x}))\right] =\nabla_{\theta,\phi}(\mathbb{E}_{p(\epsilon)}\left[\log p_{\theta}(\mathbf{x},\mathbf{z})-\log q_{\phi}(\mathbf{z}|\mathbf{x})\right]) = \nabla_{\theta,\phi}\mathcal{L}_{\theta,\phi}(x)

i.e. the expected gradient with respect to white noise distribution p(ϵ)p(\epsilon) results in the exact expression of the truth gradient of the ELBO objective.

Lets next consider the building blog of evaluating the remaining parts of the one sample ELBO approximation.

  • Calculation of encoder log probability logϕ(zx)\log_{\phi}(z|x) given a data point xDx \in \mathcal{D} mainly connects to the choice of a proper invertible mapping z=g(ϵ)z = g(\epsilon) and tractable density p(ϵ)p(\epsilon). Using the change of variables formula the computation boils down to

logqϕ(zx)=logp(ϵ)logdet(g(ϵ)ϵ)\log q_{\phi}(z|x) = \log p(\epsilon)-\log|\det(\frac{\partial g(\epsilon)}{\partial \epsilon})|

which in particular shows that tractable mappings are those which Jacobian is upper or lower triangular matrix where an important class of such mappings is given by assumming the mean-field like strucutre of the encoder

qϕ(zx)=iqϕ(zix)=iN(zi;μi,σi2)q_{\phi}({\bf z}|{\bf x})=\prod_{i}q_{\phi}(z_{i}|{\bf x})=\prod_{i}N(z_{i};\mu_{i},\sigma_{i}^{2})

Then the mapping g(ϵ)g(\epsilon) can be contructed from the transformation of gaussian distribution under linear transofmation, i.e. via

ϵN(0,I)ϵ=NNEncoderϕ(x)z=g(ϵ,ϕ)=μ+σϵ\epsilon \sim \mathcal{N}(0,{\bf I}) \qquad \epsilon=\mathrm{NN_{Encoder}}\phi({\bf x}) \qquad z = g(\epsilon,\phi) = \mu+\sigma \odot\epsilon

with Jacobian given by a simple diagonal matrix with covariance at the diagonal.

More generally, we can generalize the above case to qϕ(zx)=N(zμ(x),Σ(x))q_{\phi}(z|x) = \mathcal{N}(z|\mu(x),\Sigma(x)) using Cholesky factorization Σ(x)=LLT\Sigma(x) = LL^T with invertible mapping g(ϵ)=μ+Lϵg(\epsilon) = \mu+ L\epsilon with closed form of the encoder

logqϕ(zx)=logp(ϵ)ilogLii\log q_{\phi}(\mathbf{z}|\mathbf{x})=\log p(\epsilon)-\sum_{i}\log|L_{i i}|

Hereby, the parametrization of LL in terms of neural network can be achieved in two steps

  • First (μ,logσ,L)=NNEncoderϕ(x)(\mu,\log\sigma,\mathbf{L}^{\prime}) = \mathrm{NN_{Encoder}}_{\phi}(\mathbf{x})
  • Second, applying a masking matrix LmaskL_{mask} with zeros above the diagonal via L=LmaskL+diag(σ){\bf L} = {\bf L}_{m a s k} \odot {\bf L}^{\prime}+\mathrm{diag}(\sigma).

Chalenges

  • The model can be trapped within undesirable stable equilibrium which happens when in the first optimization steps p(xz)p(x|z) is low and qϕ(zx)p(z)q_{\phi}(z|x) \approx p(z)
  • Blurriness Artifacts for nonzero KL-divergence between qϕ(x,z)q_{\phi}(x,z) and pθ(x,z)p_{\theta}(x,z) which causes the variance of the decoder pθ(x,z)p_{\theta}(x,z) to be higher then variance of the encoder qϕ(z,x)q_{\phi}(z,x) and the data qϕ(x)q_{\phi}(x)