2D PET Image Reconstruction Using Robust L1 Estimation of the Gaussian Mixture Model

Tafro, Azra; Seršić, Damir; Sović Kržić, Ana

doi:10.15388/22-INFOR482

2D PET Image Reconstruction Using Robust L1 Estimation of the Gaussian Mixture Model

Article type: Research Article

Authors: Tafro, Azra^{1; ∗} | Seršić, Damir² | Sović Kržić, Ana²

Affiliations: [1] Institute of Processes Engineering, University of Zagreb Faculty of Forestry and Wood Technology, Svetošimunska cesta 25, 10000 Zagreb, Croatia | [2] Department of Electronic Systems and Information Processing, University of Zagreb Faculty of Electrical Engineering and Computing, Unska 3, 10000 Zagreb, Croatia

Correspondence: [∗] Corresponding author.

Keywords: Gaussian mixture models, positron emission tomography, Expectation-Maximization (EM) algorithm, image segmentation

DOI: 10.15388/22-INFOR482

Journal: Informatica, vol. 33, no. 3, pp. 653-669, 2022

Received February 2021

Accepted April 2022

Published: 2022

Get PDF

Abstract

An image or volume of interest in positron emission tomography (PET) is reconstructed from gamma rays emitted from a radioactive tracer, which are then captured and used to estimate the tracer’s location. The image or volume of interest is reconstructed by estimating the pixel or voxel values on a grid determined by the scanner. Such an approach is usually associated with limited resolution of the reconstruction, high computational complexity due to slow convergence and noisy results.

This paper presents a novel method of PET image reconstruction using the underlying assumption that the originals of interest can be modelled using Gaussian mixture models. Parameters are estimated from one-dimensional projections using an iterative algorithm resembling the expectation-maximization algorithm. This presents a complex computational problem which is resolved by a novel approach that utilizes L1 minimization.

1Introduction

In positron emission tomography (PET), image reconstruction implies generating an image of a radiotracer’s concentration to estimate physiologic parameters for objects (volumes) of interest. Pairs of photons arise from emissions of annihilated positrons, and crystals placed in the scanner detect these pairs. When two are activated at the same time, the device records an event. Each possible pair of crystals is connected by a line or volume of response (LOR, VOR), depending on whether the scan is 2D or 3D. Assuming there are no contaminating physical effects or noise, the total number of coincidence events detected and the total amount of tracer contained in the tube are proportional. In the 2D case, we observe events along lines of response (LORs) joining two detector elements, lying within a specified imaging plane. The data are recorded as event histograms (sinograms or projected data) or as a list of recorded photon-pair events (list-mode data). Classic PET image reconstruction methodology is explained in more detail in e.g. Tong et al. (2010), Alessio and Kinahan (2006) or Reader and Zaidi (2007).

Modern image reconstruction methods are mostly based on maximum-likelihood expectation-maximization (MLEM) iterations. Maximum likelihood is used as the optimization criterion, combined with the expectation-maximization algorithm for finding its solution. To overcome the computational complexity and slow convergence of the MLEM, the ordered subsets expectation-maximization (OSEM) algorithm has been introduced (Hudson and Larkin, 1994). Since MLEM or OSEM estimation of pixels or voxels is usually noisy (Tong et al., 2010), use of post-filtering methods is necessary. On the one hand, image reconstruction from its projections is a mature research field with well known methods and proven results. On the other hand, known limitations made a challenge for a different approach presented in this paper.

In image segmentation, a number of algorithms based on model-based techniques utilizing prior knowledge have been proposed to model uncertainty, cf. Zhang et al. (2001, 1994). The Gaussian mixture model (GMM) is well-known and widely used in a variety of segmentation and classification problems (Friedman and Russell, 1997; Ralašić et al., 2018), as many observed quantities exhibit behaviour congruent with the model. A good overview of the application of GMMs and their generalizations to problems in image classification, image annotation and image retrieval can be found, e.g. in Tian (2018). However, in those problems the sample is from the GMM itself, whereas in this case the observed data is lower-dimensional (i.e. lines) and the points of origin are unknown.

In this paper, we propose a new robust method for reconstructing an object from a simulated PET image. The challenge of estimating Gaussian parameters from lower-dimensional data is solved by utilizing a novel L1 minimizing algorithm. With it, our framework becomes similar to the standard Expectation-Maximization (EM) algorithm. Instead of values in individual pixels or voxels, we estimate parameters of the object’s model. This enables us to model one or more objects of interest, each as a mixture of one or more components, assuming that the entire object is a source of radiation with varying intensity. We focus on the two-dimensional case for clarity, but an extension to three dimensions would follow in a similar fashion. To the best of our knowledge, this is the first attempt to use parametric models in PET image reconstruction in this way. Parameters of Gaussian mixture models have not been estimated from lower-dimensional data due to computational complexity, which is circumvented here by the algorithm that efficiently finds global minimums. This paper is organized into six sections. Section 2 gives an overview of traditional methodology in 2D PET imaging. Section 3 describes the estimation of Gaussian mixture model parameters from PET data. Section 4 shows the implementation of iterative estimates in the EM-like algorithm. Finally, experimental results in Section 5 and discussion conclude the paper.

2Two-Dimensional PET Imaging

Fig. 1

(a) Simulated measurements for N=400 and K=2. (b) The corresponding sinogram.

Data are collected along lines lying within a specific imaging plane. Traditionally, data are organized into sets of projections, integrals along all lines (LORs) for a fixed direction ϕ. The collection of all projections for 0⩽ϕ<2π forms a two-dimensional function of the distance of the LOR from the origin, denoted by s, and angle ϕ. The line-integral transform of f(x,y)→p(s,ϕ) is called the X-ray transform (Natterer, 1986), which in 2D is the same as the Radon transform (Helgason, 1999). The path for any fixed point in the projection space resembles a sinusoidal curve. This is traditionally represented by a sinogram, a graph where these paths are drawn for all points of activity simultaneously. Figure 1 illustrates one synthetic measurement and the corresponding sinogram. Classical reconstruction methods, most notably the filtered backprojection (FBP), rely on the integral transform, cf. Natterer (1986), O’Sullivan and Pawitan (1993), Alessio and Kinahan (2006). They do not include or depend on any assumptions about the data. Assumptions such as form, volume and dependence could be utilized to obtain more precise estimates.

3Estimating the Gaussian Mixture Model

A Gaussian mixture model (Reynolds, 2015) is a weighted sum of K component Gaussian densities as given by the equation:

(1)

p(x|τk,μk,Σk)=∑k=1Kτkf(x|μk,Σk),

where x is a d-dimensional observation, τi (i=1,…,K) are the weights of each Gaussian component, and f(x|μk,Σk) are the Gaussian component densities. We assume that a given observation x is a realization from exactly one of the K Gaussian mixture components, and each component density is a d-variate Gaussian function of the form:

(2)

f(x|μk,Σk)=1(2π)d|Σk|exp(−12(x−μk)TΣk−1(x−μk)).

The set of probabilities {τk} such that ∑k=1Kτk=1 defines the probabilities that x belongs to the corresponding Gaussian component. The complete Gaussian mixture model is parameterized by the mean vectors {μk}, covariance matrices {Σk} and mixture weights {τk} for all Gaussian component densities.

Traditionally in this setting, the GMMs are used to model activity concentration in voxels (Layer et al., 2015) or pixel values (Nguyen and Wu, 2013). Those models do not take into account the spatial correlation between observations, which can be corrected by introducing Markov random fields (Layer et al., 2015; Nguyen and Wu, 2013). In other image segmentation problems, this can be addressed by modelling the mixture weights as probability vectors, thereby creating a spatially varying finite mixture model (SVFMM) (Xiong and Huang, 2015).

Here, we apply the GMM directly to the spatial data, focusing only on the locations and not the values at those locations. In other words, the location of the tracer, i.e. the particles that originate the gamma rays, is represented by the the K Gaussian components. The points x that are realizations from these components are latent (unobserved). Our observations, i.e. events, are lines through these points at random angles φ, however, using convenient properties of Gaussian distributions we will still be able to accurately estimate the parameters.

As mentioned earlier, Gaussian mixture models have naturally appeared in many signal processing (Yu and Sapiro, 2011; Léger et al., 2011) and biometrics problems (Reynolds, 2015; Reynolds et al., 2000). Due to the flexibility of the mixtures, GMMs are convenient and effective for modelling various types of signals, as well as image inverse and missing/noisy data problems.

Note that in our simulations we assume that K, the number of mixture components, is known. In practical applications it should also be estimated, which is a further (and not insignificant) problem. For mixture components that are sufficiently separated in space, such as in Fig. 1, the sinogram itself could give insight into the most adequate number of components, otherwise ideas suggested in, e.g. Leroux (1992), Chen (1995), Cheng (2005), should be explored.

For clarity, in this section we focus only on one Gaussian component. Assuming we know a set of N events whose points of origin come from a Gaussian distribution with parameters (μ,Σ), we can estimate those parameters.

3.1Estimation of μ Using Minimal Distance

Each event in two dimensions is uniquely given by its slope k=tanφ and an intercept l (or by any two equivalent parameters) and we can write the event equations as

(3)

aiTx+li=0,ai=[tanφi−1]T,i=1,…,N.

The mean vector μ=[μxμy]T can be estimated in multiple ways. One method is to find the point in space whose total squared distance from all events is minimal. Clearly, the solution depends on our definition of distance. Formally, one would prefer to find the point whose total distance (not squared) is minimal, but that presents a nonlinear problem which is significantly more difficult to solve, and we will show that the linear problem provides a satisfying solution.

In general, the squared distance between d-dimensional vectors is defined as

(4)

d2(v1,v2)=(v1−v2)TW(v1−v2),

for some d×d weight matrix W. In particular, when W=I the distance in (4) is Euclidian, and W=Σ−1 gives the Mahalanobis distance. In our case, d=2 and a planar line is given by one equation, but similar results would follow for higher dimensions.

First, note that for a fixed μ the solution to

mind2(x,μ)such thataix+li=0,

is (Golub and Van Loan, 2012) the solution to

(5)

2WaiaiT0·xiμλ=2Wμ−li.

Now the optimal μ is the one that minimizes the expression

(6)

min∑i=1N(xiμ−μ)TW(xiμ−μ),

where xiμ is determined by (5). To solve this, first note that the solution to (5) is

xiμλ=2WaiaiT0−1·2Wμ−li.

To accommodate this, we will denote the expression in (6) by d2 and expand it:

d2=∑i=1N(xiμ−μ)TW(xiμ−μ)=∑i=1Nxiμλ−μλT·W000·xiμλ−μλ.

For simplicity, denote

xiλ=xiμλ,μλ=μλandW˜=W000.

Now d2 equals

∑i=1N(xiλTW˜xiλ−xiλTW˜μλ−μλTW˜xiλ+μλTW˜μλ),

which we will differentiate piecewise to find the minimum. We have:

1)
ddμxiλTW˜xiλ=22Wμ−liTBi2WaiaiT0−12W0,
where
Bi=2WaiaiT0−1TW˜.
2)
ddμμλTW˜xiλ=μλTBiT2W0+2Wμ−liTBiI0.
3)
ddμμλTW˜xiλ=μλTBiT2W0+2Wμ−liTBiI0.
4)
ddμμλTW˜μλ=2μλTW˜I0.

Note that in all calculations we use the fact that W and, by extension, W˜ are symmetric. By plugging these equations into ddμ and equating that with 0 to obtain the minimum, we get:

ddμd2=∑i=1N2Wμ−liT·Bi2WaiaiT0−12W0−I0−−μλTBiT2W0−W˜I0=0.

Define

Mimi=Bi2WaiaiT02W0−100100,Ni0=BiT2W0−W˜100100.

From the previous equation we now have

∑i=1N[2μTWT,−li]·Mimi−[μT,λi]·Ni0=0,∑i=1N(2μTWTMi−limi−μTNi)=0,μT∑i=1N(2WTMi−Ni)=∑i=1Nlimi,

from which it follows that

(7)

μT=(∑i=1Nlimi)·(∑i=1N(2WTMi−Ni))−1.

It remains to verify that this stationary point μ is also a turning point. However, since a point whose sum of squared distances from all lines is minimal must exist from a geometrical perspective, the solution in (7) is indeed the (global) minimum.

3.2Estimation of Σ Using 1D Projections

In the two-dimensional setting, since Σ is a symmetric matrix,

Σ=Σ11Σ12Σ12Σ22,

we need to estimate three parameters: Σ11, Σ12 and Σ22.

Figure 2 depicts a single event and its corresponding LOR. Recall that the line is determined by two parameters, one of which is the slope tanφ, where φ is the angle between the line and the x-axis. If we rotate the coordinate system by φ−π2, in the new coordinate system the y-axis is parallel to the event. If we represent the Gaussian distribution as in the image, in this new setup it rotated by ψ=π2−φ. Since a rotated Gaussian distribution is again Gaussian, in this new coordinate system the event is parallel to the y-axis, and the Gaussian distribution has parameters

(8)

(Tμ,TΣTT),whereT=cosψ−sinψsinψcosψis the rotation matrix.

Fig. 2

Illustration of the rotation. The event (LOR) is parallel to the new y-axis.

Given the original event parameters, the coordinates of the intersection of the event and the new x-axis are (−lsinψ,0). The projection of the Gaussian distribution onto the new x-axis remains Gaussian, and the parameters of this one-dimensional distribution are components of the original two-dimensional Gaussian. From (8) we see that the expectation is given by

(Tμ)1=cosψμx−sinψμy,

and variance is

(9)

(TΣTT)11=cos2ψΣ11−2cosψsinψΣ12+sin2ψΣ22.

We can use these one-dimensional projections and their squared (Euclidian) distance from the mean, to estimate the variance in (9). This squared distance is equal to (cosψμx−sinψμy+lsinψ)2, which gives us a system of equations:

(10)

As=b,wheres=Σ11Σ12Σ22,A=cos2ψ1−2sinψ1cosψ1sin2ψ1cos2ψ2−2sinψ2cosψ2sin2ψ2⋮⋮⋮cos2ψN−2sinψ1cosψNsin2ψN,andb=(cosψ1μx−sinψ1μy+l1sinψ1)2⋮(cosψNμx−sinψNμy+lNsinψN)2.

This problem is clearly overdetermined, since there are (dozens of) thousands of measurements. There is no exact solution and the best approximation is found by solving

(11)

mins‖As−b‖.

Note that the solution will depend on the type of norm used in (11). Following classical methodology, the ordinary least squares (OLS) method gives the solution that minimizes the L2 norm. This solution can be found by solving the equivalent problem

ATAs=AT·b,i.e.s=(ATA)−1AT·b.

As we will show in Section 5, OLS performs poorly when we introduce more than one component, so we will need to use alternative methods. L1 minimization is a popular approach in engineering problems because it gives sparse solutions, but it had also been shown that it is less sensitive than the more traditional L2 minimization (Bektaş and Şişman, 2010). The main argument against L1 minimization would be computational complexity, which can be alleviated by using iterative methods. In this paper, the L1 minimization algorithm proposed in Sović Kržić and Seršić (2018) is used to solve a modified problem

(12)

As=k·b,

where k is a constant corrective scale factor. In a sufficiently large sample, one would have enough data points in each direction ϕ to obtain a one-dimensional variance estimate. In the absence of that, we are able to obtain a robust estimator for the parameters of the covariance matrix following the median absolute deviation (MAD) estimator of deviation σ (Rousseeuw and Croux, 1993), i.e.

k=(Φ(0.75))2≈1.48262,

where Φ denotes the distribution function of the standard normal random variable.

4EM-Like Algorithm

In many statistical models, maximum likelihood parameter estimation can not be performed directly since most equations do not have an explicit closed solution. Several iterative methods have been developed to combat this, most notably the expectation-maximization (EM) algorithm. It had been proposed and used in many different circumstances before its proper introduction in Dempster et al. (1977). The name comes from its two-step setup, namely the expectation (E) and maximization (M) step which interchange until an acceptable solution is found. In the context of GMMs, and generally in emission tomography (Shepp and Vardi, 1982), the issue is that the true mixture membership is unknown (latent). The E step for given mixture parameters estimates probabilities {τk} from (2) for each data point. The M step then (re)estimates the mixture parameters from the points assigned to that mixture with their corresponding probabilities. As already mentioned, our observations are lines and the true data points are latent, however, we can replicate the iterative steps using estimates from Section 3. Alternatively, this could also be considered a Lloyd-like algorithm, where the difference from the conventional Lloyd’s algorithm (Lloyd, 1982) is that we allow different distance functions of the form (4). The algorithm initializes parameters arbitrarily, and then alternates between the following steps:

1. Compute class membership probabilities. For each event, compute the squared distance from each component mean. We distinguish between a hard classification where we assign the event to its nearest component, and a soft classification where membership probability is inversely proportional to the squared distance.
2. Estimate component parameters. Either from a hard or soft classification, where each event participates with its proportional share, parameters of each component are estimated using methodology from Section 3.

The L1 minimization algorithm recursively reduces and increases dimensionality of the observed subspace and uses weighted median to efficiently find the global minimum, and has shown to overperform state-of-the-art competitive methods when there are relatively few parameters to be estimated from a very high number of equations. For details, see Appendix and Sović Kržić and Seršić (2018).

This approach allows for different variants, depending on the type of distance used (Tafro and Seršić, 2019).

In this paper, initial steps of the iterative algorithm use Euclidian distance in (4). Since later iterations improve the estimates, the distance gradually transforms into the Mahalanobis distance, i.e.

(13)

W=(1−α)I+αΣˆ−1,

where α increases from 0 to 1. It is known that, for Σ given, the estimate obtained using the Mahalanobis distance is also the maximum likelihood estimate for μ. Therefore, in later iterations we obtain an MLE-like estimate of μ.

5Results

To evaluate the methodology, we experimented in the two-dimensional setting with K=1 and K=2 components.

First, for proof of concept we show that the method in Section 3 provides good estimates with both L1 and L2 minimization, for several covariance matrices with varying corresponding correlation coefficients:

Σ1=0.05I,Σ2=0.02−0.01−0.010.05,Σ3=0.010.020.020.05.

These matrices represent different shapes, i.e. different types of dependence, from independent (Σ1) to strongly dependent (Σ3) variables. Since the 2D covariance matrix is of the form

Σ=σx2ρσxσyρσxσyσy2,

it is determined by three parameters – σx, σy and ρ(=ρxy). This corresponds to the three-dimensional vector

s=[Σ11Σ12Σ22]T=[σx2ρσxσyσy2]T

in (10). Instead of observing true and estimated Σ matrices we will calculate the error in estimation of s for each s that corresponds to matrices above:

s1=0.0500.05,s2=0.02−0.010.05,s3=0.010.020.05.

Note that the corresponding correlation coefficients can be calculated from these vectors, and they are ρ1=0, ρ2≈−0.3, ρ3≈0.9.

Fig. 3

Left: Relative error in estimation of μ from a single measurement (N=1000 events). Right: Average relative error for increasing sample size, for three types of covariance.

5.1One-Dimensional Estimation

Accuracy of an estimate vˆ of vector v can be assessed in many ways, for illustrative purposes we chose relative error, i.e.

‖e‖=‖vˆ−v‖‖v‖,

where ‖·‖ denotes the standard Euclidian norm in both expressions.

Initial value of α in the distance weight matrix was set to zero, and proportionally increased to 1 in the final iteration. The left side of Fig. 3 illustrates how the errors in the estimation of μ change with the number of iterations in a single experiment with N=1000 points, for all three types of covariance matrices. Depending on the type of covariance matrix, the accuracy increases as |ρ| increases, but in all three scenarios the estimation is stable within less than 10 iterations, and sufficiently accurate. In further experiments, the number of iterations in the algorithm was set to 10, with the distance metric varying according to (13). The right side of Fig. 3 shows how the accuracy in estimation of μ increases with the sample size, with the number of experiments set to 100 for each sample size. The relative errors decreases as the number of points increases, as is expected, but we can also notice that the error is fairly similar even for sample sizes that are very small (compared to the usual sample sizes in PET imagery).

For each of these types of covariances we then simulated a measurement from N=1000 and N=10000 points and calculated the average estimate of vector s using L1 and L2 minimization separately. We repeated the experiment 1000 times in each case. Results are given in Table 1.

Table 1

Mean estimation error, K=1.

N=1000	s1	s2	s3
L1 method	13.78%	11.88%	9.28%
L2 method	8.27%	7.61%	7.6%

N=10000	s1	s2	s3
L1 method	4.28%	3.66%	2.93%
L2 method	2.61%	2.38%	2.37%

Given that the variance of the traditional standard deviation estimator (from points) equals σ4n−1, estimations within 10% from N=1000 events are acceptable, which justifies the methodology described in Section 3. Significantly more accurate estimation from N=10000 events further confirms this. It is also notable that in general the accuracy of the estimate increases as |ρ| increases from 0 to 1, i.e. when the set of points of origin is more elongated in shape.

5.2Two-Dimensional Estimation

For K=2 components we repeated the experiment as described in Section 4 for various combinations of types of vector s. We used hard classification, where we assign each line to at most one component. We experimented with various constraints, from simply assigning events to more likely components to assignations only when the probability of belonging is above a certain threshold.

For synthetic measurements from a variety of original (real) GMMs the algorithm proved robust regardless of the values of initial parameters, with estimation using L1 minimization and the scaling factor k correcting the bias. The L2 minimization method proved inefficient, since wrongly assigned events would cause unstable estimations and “breaks” in the algorithm.

Table 2 shows the percentage of correctly classified events for several combinations of covariance matrices, where a total of N=4000 events were simulated from two mixtures (N1=2500, N2=1500) for each combination. Events were classified with no probability thresholds, i.e. each event was assigned to the more likely component. We can conclude overall that the algorithm has a very good classification rate which, combined with the results for individual components, gives reason to expect accurate estimations for multiple components.

Table 2

Correct classification, K=2.

Cov. 1	Cov. 2	Comp. 1	Comp. 2	Total
Σ1	Σ2	87.63%	93.49%	89.83%
Σ2	Σ3	93.52%	92.13%	93.00%
Σ3	Σ1	91.26%	95.83%	92.98%

An illustration of the results for K=2, N=4000 is shown in Fig. 4, along with the corresponding classical FBP method.

Fig. 4

(a) Classical FBP reconstruction. (b) Proposed reconstruction using L1 minimization.

6Conclusion

These results show proof of concept that it is possible to reconstruct data from latent mixture models using lower-dimensional observations and state-of-the-art computational techniques. Estimation from lower-dimensional measurements had not been thoroughly developed for Gaussian mixture models in general, and especially in this context. This paper shows that the main obstacle, slow computational speed, can be evaded by an alternative approach to minimization, which opens the door to various new procedures. Further work includes other metrics and membership calculation and utilizing traditional methods to obtain the number of mixing components and optimal initial values.

The fact that the reconstructed image is given by its parametric model (mean vectors {μk}, covariance matrices {Σk} and mixture weights {τk}) has many benefits. This approach would allow for accurate reconstruction from significantly fewer measurements, thus decreasing patients’ exposure to radiation. It is virtually of infinite resolution, since the Gaussian components can be evaluated at each spatial point. The model is sparse: it consists from only a few parameters needed for successful object representation. Hence, future research will be oriented to compressed sensing approach (Rani et al., 2018; Ralašić et al., 2018, 2019) for reducing the number of projections, in this case reduced radiotracer’s concentration. Due to robustness of the proposed reconstruction method, a post-filtering step is not needed. This novel point of view aims to match the accuracy of iterative parametric models with the computational speed of the analytic approach by combining advantageous statistical models with state-of-the-art techniques from other image reconstruction problems.

Appendices

Appendix

The L1 minimization algorithm recursively reduces and increases dimensionality of the observed subspace and uses weighted median to efficiently find the global minimum. Reduction of dimensionality is achieved by extracting of parameters and inserting them into remaining equations in (12). If [Ai1Ai2Ai3] is the i-th row of matrix A, and bi the i-th element of vector k·b, the i-th equation of the system is

bi=Σ11Ai1+Σ12Ai2+Σ13Ai3.

We choose equation j1 from the set i=1,…,N and extract one of its parameters, e.g. Σ11:

(14)

Σ11=−Aj12Aj11Σ12−Aj13Aj11Σ22+bj1.

We insert it into all other equations and get a new system with only two unknown parameters:

bi−bj1Ai1=(Ai2−Aj1Aj11Ai1)Σ12+(Ai3−Aj13Aj11Ai1)Σ22,

for i≠j1. Now, we choose some other equation j2,j2≠j1 and extract one of the remaining parameters, e.g. Σ12:

(15)

Σ12=−Aj23Aj11−Aj13Aj21Aj22Aj11−Aj12Aj21Σ22+(bj2−bj1Aj21)Aj11Aj22Aj11−Aj12Aj21.

We insert Σ12 into all other equations and get the system of N−2 equations with only one unknown parameter Σ22:

(16)

bi(1)=Ai(1)Σ22,

where i≠j1,j2,

bi(1)=bi−bj1Ai1−(Ai2−Aj12Aj11Ai1)(bj2−bj1Aj21)Aj11Aj22Aj11−Aj12Aj21,Ai(1)=Ai3−Aj13Aj11Ai1−(Ai2−Aj12Aj11Ai1)Aj23Aj11−Aj13Aj21Aj22Aj11−Aj12Aj21.

Parameter Σ22 is calculated by minimizing L1 norm

min∑i=1i≠j1,j2N|bi(1)−Ai(1)Σ22|

which equals

min∑i=1i≠j1,j2N|Ai(1)||bi(1)Ai(1)−Σ22|.

The value of parameter Σ22 is given by the weighted median (MED):

(17)

(Σ22,j3)=MED(|Ai(1)|♢bi(1)Ai(1)|i=1,i≠j1,j2N),

where j3 is an ordinal number of the concomitant equation and ♢ is the replication operator. The weighted median can be obtained using the algorithm given in Sović Kržić and Seršić (2018). The value of parameter Σ22 is an element of set {bi(1)/Ai(1)}. Chosen equations {j1,j2,j3} define a local minimum in 1D, a vertex.

Return to a higher dimension is achieved by putting calculated parameter value Σ22 in (15). To further descending in L1 cost surface, we fix equations j1 and j3, and try to find new j4 using (16)–(17). If j4≠j2, new vertex is defined by {j1,j2,j3}={j1,j3,j4} and we repeat the same procedure. If j4=j2, we conclude that the vertex {j1,j2,j3} is a local minimum in observed 2D subspace, thus we return to 3D by fixing j2 and j3 and try to find new j4 instead of j1. We repeat the previous procedure until we cannot find new equation. Since the L1 cost surface is convex, the global minimum is reached. Parameters Σ11, Σ12 and Σ22 are given by (14), (15) and (17).

Acknowledgments

This research has been supported by the Croatian Science Foundation under the project IP-2019-04-6703.

References

1	Alessio, A., Kinahan, P. ((2006) ). PET image reconstruction. Nuclear Medicine, 1: , 1–22. https://doi.org/10.1118/1.2358198.
2	Bektaş, S., Şişman, Y. ((2010) ). The comparison of L11 and L22-norm minimization methods. International Journal of Physical Sciences, 5: (11), 1721–1727.
3	Chen, J. ((1995) ). Optimal rate of convergence for finite mixture models. The Annals of Statistics, 23: (1), 221–233. https://doi.org/10.1214/aos/1176324464.
4	Cheng, Y.-m. ((2005) ). Maximum weighted likelihood via rival penalized EM for density mixture clustering with automatic model selection. IEEE Transactions on Knowledge and Data Engineering, 17: (6), 750–761. https://doi.org/10.1109/TKDE.2005.97.
5	Dempster, A.P., Laird, N.M., Rubin, D.B. ((1977) ). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 39: , 1–38. https://doi.org/10.2307/2984875.
6	Friedman, N., Russell, S. ((1997) ). Image segmentation in video sequences: a probabilistic approach. In: Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, UAI’97, pp. 175–181. http://dl.acm.org/citation.cfm?id=2074226.2074247.
7	Golub, G.H., Van Loan, C.F. ((2012) ). Matrix Computations, Vol. 3: . JHU Press. https://doi.org/10.1137/1028073.
8	Helgason, S. ((1999) ). The Radon Transform, Vol. 2: . Springer.
9	Hudson, H.M., Larkin, R.S. ((1994) ). Accelerated image reconstruction using ordered subsets of projection data. IEEE Transactions on Medical Imaging, 13: (4), 601–609. https://doi.org/10.1109/42.363108.
10	Layer, T., Blaickner, M., Knäusl, B., Georg, D., Neuwirth, J., Baum, R.P., Schuchardt, C., Wiessalla, S., Matz, G. ((2015) ). PET image segmentation using a Gaussian mixture model and Markov random fields. EJNMMI Physics, 2: (1), 9. https://doi.org/10.1186/s40658-015-0110-7.
11	Léger, F., Yu, G., Sapiro, G. ((2011) ). Efficient matrix completion with Gaussian models. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 1113–1116.
12	Leroux, B.G. (1992). Consistent estimation of a mixing distribution. The Annals of Statistics, 1350–1360. https://doi.org/10.1214/aos/1176348772.
13	Lloyd, S. ((1982) ). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28: (2), 129–137. https://doi.org/10.1109/TIT.1982.1056489.
14	Natterer, F. ((1986) ). The Mathematics of Computerized Tomography, Vol. 32: . Siam. https://doi.org/10.1137/1.9780898719284.
15	Nguyen, T.M., Wu, Q.J. ((2013) ). Fast and robust spatially constrained Gaussian mixture model for image segmentation. IEEE Transactions on Circuits and Systems for Video Technology, 23: (4), 621–635. https://doi.org/10.1109/TCSVT.2012.2211176.
16	O’Sullivan, F., Pawitan, Y. ((1993) ). Multidimensional density estimation by tomography. Journal of the Royal Statistical Society: Series B (Methodological), 55: (2), 509–521. https://doi.org/10.1111/j.2517-6161.1993.tb01919.x.
17	Ralašić, I., Seršić, D., Petrinović, D. ((2019) ). Off-the-shelf measurement setup for compressive imaging. IEEE Transactions on Instrumentation and Measurement, 68: (2), 502–511. https://doi.org/10.1109/TIM.2018.2847018.
18	Ralašić, I., Tafro, A., Seršić, D. ((2018) ). Statistical compressive sensing for efficient signal reconstruction and classification. In: 2018 4th International Conference on Frontiers of Signal Processing (ICFSP), pp. 44–49. https://doi.org/10.1109/ICFSP.2018.8552059.
19	Rani, M., Dhok, S., Deshmukh, R. ((2018) ). A systematic review of compressive sensing: concepts, implementations and applications. IEEE Access, 6: , 4875–4894. https://doi.org/10.1109/ACCESS.2018.2793851.
20	Reader, A.J., Zaidi, H. ((2007) ). Advances in PET image reconstruction. PET Clinics, 2: (2), 173–190. PET Instrumentation and Quantification. https://doi.org/10.1016/j.cpet.2007.08.001.
21	Reynolds, D. ((2015) ). Gaussian mixture models. In: Li, S.Z., Jain, A.K. (Eds.), Encyclopedia of Biometrics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7488-4_196.
22	Reynolds, D.A., Quatieri, T.F., Dunn, R.B. ((2000) ). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10: (1–3), 19–41. https://doi.org/10.1006/dspr.1999.0361.
23	Rousseeuw, P.J., Croux, C. ((1993) ). Alternatives to the median absolute deviation. Journal of the American Statistical association, 88: (424), 1273–1283. https://doi.org/10.1080/01621459.1993.10476408.
24	Shepp, L.A., Vardi, Y. ((1982) ). Maximum likelihood reconstruction for emission tomography. IEEE Transactions on Medical Imaging, 1: (2), 113–122. https://doi.org/10.1109/TMI.1982.4307558.
25	Sović Kržić, A., Seršić, D. ((2018) ). L1 minimization using recursive reduction of dimensionality. Signal Processing, 151: , 119–129. https://doi.org/10.1016/j.sigpro.2018.05.002.
26	Tafro, A., Seršić, D. ((2019) ). Iterative algorithms for Gaussian mixture model estimation in 2D PET Imaging. In: 2019 11th International Symposium on Image and Signal Processing and Analysis (ISPA), pp. 93–98. https://doi.org/10.1109/ISPA.2019.8868570.
27	Tian, D. ((2018) ). Gaussian mixture model and its applications in semantic image analysis. Journal of Information Hiding and Multimedia Signal Processing, 9: (3), 703–715.
28	Tong, S., Alessio, A.M., Kinahan, P.E. ((2010) ). Image reconstruction for PET/CT scanners: past achievements and future challenges. Imaging in Medicine, 2.5: , 529–545. https://doi.org/10.2217/iim.10.49.
29	Xiong, T., Huang, Y. ((2015) ). Robust Gaussian mixture modelling based on spatially constraints for image segmentation. Journal of Information Hiding and Multimedia Signal Processing, 6: (5), 857–868.
30	Yu, G., Sapiro, G. ((2011) ). Statistical compressed sensing of Gaussian mixture models. IEEE Transactions on Signal Processing, 59: (12), 5842–5858. https://doi.org/10.1109/TSP.2011.2168521.
31	Zhang, J., Modestino, J.W., Langan, D.A. ((1994) ). Maximum-likelihood parameter estimation for unsupervised stochastic model-based image segmentation. IEEE Transactions on Image Processing, 3: (4), 404–420. https://doi.org/10.1109/83.298395.
32	Zhang, Y., Brady, M., Smith, S. ((2001) ). Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Transactions on Medical Imaging, 20: (1), 45–57. https://doi.org/10.1109/42.906424.