Systematic underestimation of uncertainties by widespread neutron-scattering data-reduction software

Heybrock, Simon; Wynen, Jan-Lukas; Vaytet, Neil

doi:10.3233/JNR-220049

Systematic underestimation of uncertainties by widespread neutron-scattering data-reduction software

Article type: Research Article

Authors: Heybrock, Simon^{; *} | Wynen, Jan-Lukas | Vaytet, Neil

Affiliations: European Spallation Source ERIC, P.O. Box 176, SE-221 00, Lund, Sweden

Correspondence: [*] Corresponding author. E-mail: [email protected].

Keywords: Software, neutron data-reduction

DOI: 10.3233/JNR-220049

Journal: Journal of Neutron Research, vol. 25, no. 2, pp. 65-84, 2023

Published: 4 August 2023

Get PDF

Abstract

Data-reduction software used at neutron-scattering facilities around the world, Mantid and Scipp, ignore correlations when propagating uncertainties in arithmetic operations. Normalization terms applied during data-reduction frequently have a lower dimensionality than the quantities being normalized. We show how the lower dimensionality introduces correlations, which the software does not take into account in subsequent data-reduction steps such as histogramming, summation, or fitting. As a consequence, any uncertainties in the normalization terms are strongly suppressed and thus effectively ignored. This can lead to erroneous attribution of significance to deviations that are actually pure noise, or to overestimation of significance in final data-reduction results that are used for further data analysis. We analyze this flaw for a number of different cases as they occur in practice. For the two concrete experiments that are comprised in these case studies the underestimation turns out to be of negligible size. There is however no reason to assume that this generalizes to other measurements at the same or at different neutron-scattering beamlines. We describe and implement a potential solution that yields not only corrected error estimates but also the full variance-covariance matrix of the reduced result with minor additional computational cost.

1.Introduction

The number of neutron counts measured in a detector element during a neutron-scattering experiment follows a Poisson distribution. In practice this is typically handled with a Gaussian approximation, setting the standard-deviation σ=counts, which is a good approximation for a large number of counts. After acquisition of the raw detector counts, data-reduction is performed with the goal to compute, e.g., a momentum-transfer dependent elastic differential cross section P(Q). An important part of the result are the statistical uncertainties of P(Q). Software targeted at this data-reduction step such as Mantid [1] and Scipp1 1 [12,23] handle propagation of uncertainty automatically [7]. Error propagation in event-mode [19,20] is also supported and allows for delaying lossy histogramming and rebinning procedures. The software’s implicit or explicit promise to the user – usually a scientist – is that the automatic propagation of uncertainties will ensure they can rely on the statistical significance of their result. As a concrete example, they may rely on the significance of a fit to a peak in P(Q), which is linked to the size of the error bars of P(Q).

One caveat in both Mantid and Scipp is that correlations are not taken into account in the propagation of uncertainties. This is generally not considered a problem since the raw data are assumed to be uncorrelated. However, as we will show, ubiquitous operations in neutron-scattering data-reduction introduce such correlations in intermediate computation steps but subsequently fail to take these into account. This leads to a systematic underestimation of uncertainties in final results such as P(Q). The issue we describe is not hinging on any neutron-scattering-specific aspect and would therefore pertain to any code or software implementing similar step-wise propagation of uncertainties. The main examples in our case are (1) normalization of counts measured in detector pixels to, e.g., a neutron monitor spectrum, (2) normalization in event-mode, where neutron event weights are scaled with the value in the corresponding bin of the normalization term histogram,2 2 and (3) a combination of (1) and (2). We will show that this is not simply an issue of an inaccurate approximation. Whether this underestimation is of relevant size depends on the experiment design, in particular on whether all normalization terms have sufficient statistics.

The remainder of this work is organized as follows. We describe the problem in Section 2. In Section 3 we analyze a couple of concrete examples, in particular for small-angle neutron scattering (SANS) and powder diffraction. Furthermore, we derive full expressions for the variance-covariance matrices of the reduced data, which turn out to be simple and computationally tractable. We confirm the correctness of the improved estimates for the uncertainties using a bootstrapping [9] approach. The results are discussed in Section 4 and we draw conclusions in Section 5.

2.Problem description

For neutron-scattering data-reduction the most relevant operations that require handling of uncertainties are addition, multiplication, and division. Normalization operations can be implemented as division or as multiplication with the inverse. The Taylor series expansion (truncated after first order) for uncertainty propagation [3] gives

(1)var(x+y)=var(x)+var(y)+2cov(x,y),(2)var(x−y)=var(x)+var(y)−2cov(x,y),(3)var(x·y)≈(xy)2[var(x)x2+var(y)y2+2cov(x,y)xy],(4)var(xy)≈(xy)2[var(x)x2+var(y)y2−2cov(x,y)xy],

where var and cov denote the variance and covariance, respectively. The variance is given by the square of the standard deviation, e.g., var(x)=σx2. The uncertainty propagation in both Mantid and Scipp (current-software, abbreviated CS from here on) is based on this expansion. For the raw input data such as neutron counts measured in a detector pixel or in a neutron monitor, CS always assumes that the inputs are uncorrelated, i.e., that any covariance term cov(x,y)=0.3 3 However, correlations are introduced in operations that broadcast [5], i.e., given an array a=[a1,…,an] and ba=[a1b,…,anb] with a scalar b, the elements of the array ba are correlated with

(5)cov(aib,ajb)=aiajvar(b).

Depending on the operation, i and j might label detector pixels, events within a bin, or both. Subsequent operations frequently include summation over i, i.e., the sum of all elements of ba (or subsets of elements), a fit to an i-dependent function, or in general a computation of any function f(a,b). The CS implementations ignore covariance terms as the one given in Eq. (5). This is obviously incorrect, but in widespread use. The remainder of this manuscript offers a more detailed analysis of the consequences in a number of cases occurring in practice, but adds no further fundamental insights beyond the above statement.

The most straightforward example of the incorrect treatment of correlations is the computation of a histogram of event data after normalization. As described in [19], CS event-mode normalization first histograms the normalization term. Then, for every event of the data to normalize, the corresponding bin in the normalization histogram is identified and the event weight is scaled with the inverse of the value of that bin. We consider a single bin with n events, their weights ai, and a bin-specific (but not event-specific) normalization term b. The value of the histogram in the bin is

(6)f(a,b)=∑i=1naib.

Using Eq. (4), and subsequently Eqs (1) and (5), we obtain4 4

(7)var(f(a,b))=var(∑i=1naib)≈∑i=1n(aib)2[var(ai)ai2+var(b)b2]+∑i,j=1i≠jncov(aib,ajb)(8)=1b2∑i=1nvar(ai)+var(b)b4∑i=1nai2+∑i,j=1i≠jnaiajcov(1b,1b)(9)≈1b2∑i=1nvar(ai)+var(b)b4∑i=1nai2+var(b)b4∑i,j=1i≠jnaiaj(10)=1b2∑i=1nvar(ai)+var(b)b4∑i=1n∑j=1naiaj.

The key aspect of this result is the double sum multiplying the var(b) term. By contrast, without taking the covariance term into account, CS computes

(11)varCS(∑i=1naib)=1b2∑i=1nvar(ai)+var(b)b4∑inai2≉var(∑i=1naib),

with only a single sum in the var(b) term. Essentially CS is replacing the square of the sum over ai by the sum over the squares. If all ai are of similar magnitude then ignoring correlations is thus suppressing the contribution of b’s uncertainty var(b) (the normalization term’s uncertainty) by a factor of 1/n. In many typical applications n≫1. For such n we almost always have aj2≪∑i=1naiaj and the CS expression Eq. (11) would thus be valid only for

(12)var(b)b4∑i,j=1i≠jnaiaj≪1b2∑i=1nvar(ai).

From this we obtain the condition

(13)var(b)b2≪∑i=1nvar(ai)∑i,j=1i≠jnaiaj≈1n−1var(a)a2,

where the approximation is valid for ai≈a and var(ai)≈var(a). However, fulfilling Eq. (13) implies that the second terms in Eqs (10) and (11) are negligible. That is, the CS approach is valid only when the term it approximates is effectively zero. Computing the second term in Eq. (11) is thus simply a waste of computational resources and energy, more than doubling the number of required floating point operations.5 5 Hereafter the zero-variance approximation (V0) will denote an approach where the variances of the normalization terms are strictly set to 0.

Concretely, for the example of histogramming event data, all neutrons typically have the same initial weight ai=1 and var(ai)=1. Then the number of events n is equal to the number of counts C and Eq. (10) turns into

(14)var(∑i=1Caib)≈Cb2+C2var(b)b4.

The second term is negligible for

(15)C2var(b)b2b4C=Cvar(b)b2≪1.

Often the normalization term itself originates from a histogram of neutron event counts. Then b=N, var(b)=var(N)=N and we have

(16)Cvar(N)N2=CNN2=CN=α≪1.

This equation states that both the CS and V0 approximations are valid only if the number of detector counts C is (much) lower than the number of normalization counts b. The ratio of the two, α, may be used as a generic indicator of the validity of the approximation.

To illustrate this effect, we show in Fig. 1 how the relative uncertainty for a histogram of normalized event data scales with the number of events. Both the CS implementation (×) and the V0 approximation (+) of event-mode normalization lead to a relative error of the histogrammed result that goes to 0 with increasing event count. Importantly, CS and V0 yield near-identical results for all values of C. In contrast, normalization after histogramming (∙) or correct handling of correlations yields a relative error limited by the relative error of the normalization factor.

Fig. 1.

Relative error σC/N/(C/N) of a single histogram bin computed by CS for normalization to a fixed number of N=1000 counts. The relative error of the normalization term, σN=N/N≈0.032, is shown as a dashed line and should represent the limit and lower bound of the relative error for large number C of events in the bin. We show (×) relative error obtained from histogramming after normalization (incorrect), (+) relative error obtained when forcing σN=0, i.e., when ignoring uncertainties of the normalization term, and (∙) relative error obtained from normalizing the histogrammed data (correct). In the V0 case the result is independent of the order of normalization and histogramming. The two CS implementations, Mantid and Scipp, yield results identical within floating-point rounding errors, so we are not plotting results separately.

In summary, the CS approach is thus either wasteful (since it is equivalent to V0 with additional compute overhead), or incorrect. In the next section we analyze more complex cases, namely normalization in SANS and powder diffraction workflows. These involve a transformation of coordinates6 6 after the normalization step but before a histogramming/summing step. While the problem of the CS approach is conceptually the same, the concrete effects are thus slightly harder to visualize and grasp.

3.Case studies

3.1.SANS

In a time-of-flight SANS experiment, the differential scattering cross section S(Q) is defined as [11]

(17)S(Q)=∂ΣQ∂Ω=AH∑i,j,λ⊂QC(i,j,λ)AMt∑i,j,λ⊂QMsampleincident(λ)T(λ)D(λ)Ω(i,j),(18)T(λ)=MsampletransmissionMsampleincidentMdirectincidentMdirecttransmission.

Here AH is the area of a mask (which avoids saturating the detector) placed between the monitor of area AM and the main detector, Ω is the detector solid angle, C is the count rate on the main detector, which depends on the pixel indices i,j and the wavelength λ, t is the sample thickness, M represents the incident or transmission monitor count rate for the sample run or the direct run, and T is known as the transmission fraction, and finally, D is the direct beam function.7 7 ∑i,j,λ⊂Q denotes a sum over all pixels and wavelengths that contribute to the desired Q-bin. Note that Msampleincident cancels with the corresponding divisor in T. S(Q) is computed for a sample run and a background run, and the background-subtracted P(Q)=S(Q)−Sbackground(Q) is then used for further analysis.

In the SANS case the CS calculation of the normalization term itself (rather than the normalization of the detector counts) in Eq. (17) suffers from the issue described in Section 2. In contrast to the example from Fig. 1, the problem here is not caused by event-mode normalization, but by a different type of broadcast operation. Since the λ-dependent factors in the denominator do not depend on i and j, and the solid angle Ω(i,j) does not depend on λ, multiplying them requires a broadcast in both λ and (i,j).8 8 The CS implementation then sums the product over λ,i,j, ignoring correlations. Finally, the resulting uncertainties are combined with the uncertainties from the numerator.

Note that there are a number of other contributions to the uncertainties with similar issues, which we ignored in this analysis. Our goal here is to provide an example for illustration of the problems in CS, not a complete analysis of the entire SANS data reduction. See the discussion in Section 4 for more background.

To illustrate the issue we study the data-reduction workflow for an experiment on non-ionic surfactants that was carried out at the Sans2d instrument [22] at the ISIS facility at the STFC Rutherford Appleton Laboratory.9 9 The SANS reduction workflow we used can be found online at [21]. A key factor in determining whether the CS implementation is adequate is the ratio of total detector counts to total monitor counts10 10

(19)α=∑i,j,λC(i,j,λ)∑λMsampletransmission(λ).

This can vary in practice based on the instrument design, in particular the monitor efficiency. A larger sample volume or a sample that is a better scatterer will also lead to a larger α. For example, a 10× larger sample would scatter approximately 10× more neutrons and yield correspondingly higher detector counts while the monitor counts would be approximately unchanged. To highlight the influence of α, we simulated α>αactual by drawing, for an actual monitor bin with M(λ) counts, from a Poisson distribution f(k;Λ) with Λ=αactualαM(λ) and reran the reduction workflow. In Fig. 2 we compare the uncertainties for S(Q) computed by CS and an improved analytical approach (which will be described below) for three values of α. The source code for reproducing these results can be found in the supplementary material [13]. In this concrete experiment αactual=0.27 and the difference between the CS result and corrected result is negligible. For larger α we observe a clear difference.

Fig. 2.

(a) The scattering differential cross-section S(Q) computed by the SANS workflow. (b) Relative uncertainty of the cross section ΣS for the SANS workflow. The thick black curve represents the results obtained with CS, while the coloured lines are the analytical calculations with varying α coefficients. The results obtained with CS are virtually identical for all values of α. (c) Relative difference in standard deviations between CS and the analytical calculation.

Fig. 3.

Comparison of the relative standard deviation for S(Q) as computed by our improved analytical approach (blue) and the bootstrap approach (pink) at α=100·αactual=27. The small bottom panel shows the relative difference between the two. R is the number of bootstrap replicas, which influences the amount of noise on the result (see Appendix A).

As a verification of these results, we independently computed the uncertainties by using a bootstrap [9] method – a sampling technique that can be used to estimate the distribution of a statistic.11 11 It involves repeatedly sampling from a population with replacement, and then calculating the statistic of interest for each sample. The resulting distribution of statistics can then be used to calculate a confidence interval for the population parameter. This approach is useful because it allows us to make inferences about a population even when we do not have a theoretical distribution for the statistic of interest. The biggest advantage of this approach is its simplicity, but it comes at a high cost – in our case we need to repeat the entire data-reduction for every bootstrap sample, i.e., hundreds or thousands of times. The bootstrapping approach is described in some more detail in Appendix A. We can see in Fig. 3 that there is an excellent agreement between the bootstrap results and our analytical derivation (see Appendix A, in particular Fig. 8 which shows the same result for R=1024 together with additional values of R).12 12 The source code for bootstrapping is also included in the supplementary material [13].

The correct (first order) expression for the uncertainties for this reduction workflow can be derived in a straightforward manner. We adopt a simpler and more generic notation equivalent to Eq. (17) as follows. We define

(20)cQ=AH∑i,j,λ⊂QC(i,j,λ),(21)pk=Ω(i,j)with k=(i,j),(22)wλ=AMtMsampleincident(λ)T(λ)D(λ).

In vector notation this yields the input parameter vectors c=(cQ1,…,cQC)T (the detector counts binned into Q-bins of width ΔQ),13 13 p=(p1,…,pP)T (the pixel-dependent scaling factors), and w=(wλ1,…,wλW)T (the wavelength-dependent scaling factors such as monitor counts), where C, P, and W denote the number of Q-bins, pixels, and wavelength-bins, respectively. Typically we assume that the respective variance-covariance matrices Σc, Σp, and Σw are diagonal, e.g., Σc=diag(var(c1),…,var(cQC)), but the following derivation is valid even if they are not. Aside from constant factors, Eq. (17) is then equivalent to

(23)SQ=cQpTHQw=cQnQ,where (HQ)iλ=1for Qiλ∈[Q,Q+ΔQ)0otherwise.

SQ are the components of the discrete differential scattering cross section S. The matrices HQ encode the histogramming procedure, equivalent to the ∑i,j,λ⊂Q sum in Eq. (17).14 14 We use the usual first-order truncated Taylor expansion to compute the variance-covariance matrix ΣS of S from the variance-covariance matrices of the inputs,

(24)ΣS≈JΣcΣpΣwJT,with the Jacobian matrix J=∇cTS∇pTS∇wTS,

where T denotes the matrix-transpose and the rows of the three submatrices are given by

(25)∇cSQ=nQeˆQ,(26)∇pSQ=−cQnQ2HQw,(27)∇wSQ=−cQnQ2HQTp.

eˆQ denotes the unit vector for direction Q. This yields for the elements of ΣS

(28)ΣQQ′S≈nQnQ′ΣQQ′c+cQcQ′nQ2nQ′2[wTHQTΣpHQ′w+pTHQΣwHQ′Tp].

In current practice, the uncertainties of the pixel-dependent term (such as the solid angle of detector pixels) are typically ignored, i.e., Σp=0. Then

(29)ΣQQ′S≈nQnQ′ΣQQ′c+cQcQ′nQ2nQ′2pTHQΣwHQ′Tp.

The Σw term involves histogramming the pixel-dependent p into a matrix of (Q,λ)-bins. This is cheap to compute, in particular since for typical SANS experiments we have only moderate bin counts, NQ=O(100) and Nλ=O(100). The remaining operations for this term then involve row (or column) scaling as well as matrix-matrix multiplications of the matrices computed by the histogramming procedure. We refer to Section 4.3 for a more detailed look into the performance impact of the improved analytical approach.

The diagonal component of this result is included in Fig. 2. In addition, we show the resulting Pearson correlation matrix corr(SQ,SQ′)=ΣQQ′S/var(SQ)var(SQ′) in Fig. 4. This figure also includes the equivalent results computed using the bootstrapping approach and – as previously shown for the variances – there is excellent agreement for the covariances or correlations. Overall, there clearly are correlations and these should be taken into account when fitting a model to the data.15 15

Fig. 4.

Correlation matrices for the time-of-flight SANS-reduction workflow using α=100·αactual=27. As the correlation matrix is symmetric under Q↔Q′, we can show in the same figure the analytical result (above the diagonal) and the bootstrap result (below the diagonal, R=1024 replicas). See Appendix B for a version of the figure with a logarithmic color scale.

3.2.Powder diffraction

We consider a simplified model of a time-of-flight powder-diffraction reduction workflow based on the workflow used at ISIS [15]. Our model workflow consists of the following steps: (1) compute wavelength coordinates for detector data and monitor data, (2) compute wavelength histograms, (3) normalize the detector data by the monitor, (4) compute d-spacing coordinate for detector data, and (5) compute d-spacing histogram, combining data from all (or a user-defined subset of) detector pixels. This yields the normalized d-spacing spectrum Sd (for the sample) or Vd (for a vanadium run). We will consider Vd and the normalized propability distribution Pd=Sd/Vd in the next section. Note that in contrast to the ISIS powder reduction workflow, the SNS workflow [24] does generally not use a monitor normalization. For our analysis we use data from the Wish beamline [6] at ISIS.16 16

In Fig. 5 we compare the uncertainties computed by CS and by an improved analytical approach which will be described below. Source code for reproducing these results can be found in the supplementary material [13]. As described in Section 3.1, we study the influence of the ratio of total detector to monitor counts, α (defined equivalently to Eq. (19)), influenced, e.g., by the sample volume. For αactual=2.2 as in this example, the relative underestimation by CS is below the 10% level. The discrepancy is largest for low d and at diffraction peaks. For larger α we observe a much more significant difference.

Fig. 5.

Top: S(d). Middle: relative standard deviation for S(d) as computed by CS and our improved analytical approach. Bottom: relative difference between standard deviations as computed by CS and our analytical approach. We show results for three different values of α. As in Section 3.1, we simulated α>αactual by drawing, for an actual monitor bin with wλ counts, from a Poisson distribution f(k;Λ) with Λ=αactualαwλ and reran the reduction workflow.

Fig. 6.

Comparison of the relative standard deviation for S(d) as computed by our improved analytics approach and the bootstrap approach at α=10·αactual=22.3. The small bottom panel shows the relative difference between the two. R is the number of bootstrap replicas, which influences the amount of noise on the result.

For the simplified model workflow we can derive the correct (first order) expression for the uncertainties in a straightforward manner. This also provides the covariances or correlations between different d-spacing values. We adopt the following notation. Let i∈[1,…,Ntotal]. For an event-mode workflow i labels events with Ntotal=Nevent, irrespective of the detector pixel that recorded an event. Then the tuple (ci,di,λi) represents the counts (usually 1), d-spacing, and wavelength for the i-th event. If not in event-mode, i labels bins across all detector pixels, with Ntotal=NλNpixel where Nλ is the number of wavelength bins (per pixel) and Npixel is the number of detector pixels. Then the tuple (ci,di,λi) represents the counts, d-spacing, and wavelength for the i-th bin. We define input parameter vectors c=(c1,…,cNtotal)T and w=(wλ1,…,wλW)T (the wavelength-dependent scaling factors such as monitor counts). Typically we assume that the respective variance-covariance matrices Σc and Σw are diagonal, e.g., Σw=diag(var(wλ1),…,var(wλW)), but the following derivation is valid even if they are not. The model workflow computes Sd from detector counts c and monitor counts w,

(30)Sd=nTHdc,

where n is the element-wise inverse of w, i.e., n=(1/wλ1,…,1/wλw)T. The Hd matrix encodes the histogramming procedure with d-spacing bin size Δd and wavelength bin size Δλ,17 17 defined as

(31)(Hd)λi=1for di∈[d,d+Δd),λi∈[λ,λ+Δλ)0otherwise.

Note that the definition of Hd differs from that of HQ in Eq. (23) – in the latter, the column index i labels pixels whereas in the former it labels events or bins. We use the usual first-order truncated Taylor expansion to compute the variance-covariance matrix ΣS of S from the variance-covariances matrices of the inputs,

(32)ΣS≈JΣcΣwJT,with the Jacobian matrix J=∇cTS∇wTS.

The rows of these matrices compute as

(33)∇cSd=HdTn,(34)∇wSd=−[diag(n)]2Hdc=−N2Hdc.

The matrix N is defined as diag(n). Then the elements of ΣS compute as

(35)Σdd′S≈nTHdΣcHd′Tn+cTHdTN2ΣwN2Hd′c.

For a diagonal Σc (the common case) the first term is also diagonal, i.e., this term does not contribute to the covariances. Notably, Eq. (35) is valid in event mode as well as for histogrammed data.18 18 We will show in Section 4.3 that the computation of the variance-covariance matrix is not significantly more expensive than the computation of the cross section or the CS approach in general.

In Fig. 6 we compare the relative errors computed using the analytical approach and bootstrap (see Appendix A, in particular Fig. 9 which shows the same result for R=3000 together with additional values of R). Additionally, Fig. 7 shows the correlation matrices computed by both approaches. The correlations are clear and long ranged for the displayed example with artificially reduced monitor counts which simulates, e.g., a 10× larger sample. When we use the actual monitor counts, the correlations are equally present, albeit weaker.

Fig. 7.

Correlation matrices for a powder-reduction workflow. We show the result for the sample S (left, α=10·αactualsample=22.3), vanadium V (center, α=10·αactualvanadium=47.0), and the normalized P=S/V (right). As the correlation matrices ΣS, ΣV, and ΣP are symmetric under d↔d′, we can show in the same figure the analytical result (above the diagonal) and the bootstrap result (below the diagonal). For improved readability, the plots have a limited d-spacing range as there are few further features for higher d. See Appendix B for a version of the figure with a logarithmic color scale.

3.3.Vanadium normalization

For scattering off a vanadium sample, we define Vd equivalently to Sd for the sample under study above. The normalized probability distribution P is then given by

(36)Pd=SdVd.

As Sd and Vd are assumed to be uncorrelated we obtain

(37)Σdd′P=Σdd′SVdVd′+SdSd′Σdd′VVd2Vd′2.

We can thus straightforwardly compute the variance-covariance matrix of the normalized d-spacing spectrum from the known ΣS and ΣV. The resulting correlations are displayed in the rightmost panel in Fig. 7.

3.4.The effect of smoothing

Smoothing is routinely used to reduce the effect of noise in normalization terms such as a neutron monitor spectrum or a reduced vanadium spectrum. This reduces the size of the variances of the normalization term,19 19 and thus the relative importance of these uncertainties when combined with the variances of the detector counts. However, the smoothing procedure introduces additional correlations between bins of the normalization term which counterbalance the effect. Smoothing can be described as a convolution with a kernel function ω. We repeat the derivation for powder diffraction from Section 3.2 with a smoothed wavelength-dependent monitor normalization term w. Define

(38)w˜=ω∗w,(39)w˜n=(ω∗w)n=∑mωn−mwm,

where ∗ denotes the convolution. Then

(40)Σw˜≈Jww˜ΣwJww˜T,(41)Jnn′ww˜=∂w˜n∂wn′=∂(ω∗w)n∂wn′=∑mωn−mδmn′=ωn−n′.

That is, the Jacobian is a diagonal-constant matrix, a Toeplitz matrix, Ω with elements Ωnn′=ωn−n′. As an example, for a Gaussian filter this matrix would be symmetric, and for a typical discrete approximation with small finite support it would also be band-diagonal. Equivalently to Eq. (35) we then obtain

(42)Σdd′S˜=n˜THdΣcHd′Tn˜+cTHdTN˜2ΩΣwΩTN˜2Hd′c.

To compute the second term we may thus smooth the histogrammed and scaled counts N˜2Hd′c, which should result in more long-ranged correlations. Just like for the unsmoothed case this is computationally feasible in practice. For the Wish data studied in Section 3.2 we did not observe a significant impact of moderate monitor smoothing on the final variances or correlation matrices. We conjecture that the effect of smoothing may be of secondary importance for powder diffraction – after the coordinate transformation from wavelength to d-spacing, every monitor bin contributes to many d-spacing bins, even without smoothing. It is not clear if the same would also hold true, e.g., for single-crystal data. In this case only few detector pixels and wavelengths may contribute to a sharp Bragg peak. Thus, without smoothing each monitor bin might contribute only to a single peak, but smoothing could introduce correlations to other peaks.

4.Discussion

4.1.Current software: Scope and impact of incorrect uncertainty handling

We have described and demonstrated a systematic problem with the treatment of the statistical uncertainties of normalizations terms in Mantid and Scipp. This constitutes the most important result of this work.

Affected software includes Mantid (all versions at the time of writing, at least up to Mantid v6.5) and Scipp (all versions up to Scipp v22.11, inclusively) but the issue may also be present in other similar data-reduction software. Mantid is used at the ISIS Neutron and Muon Source at the STFC Rutherford Appleton Laboratory (http://www.isis.stfc.ac.uk/), the Spallation Neutron Source (SNS) and the High Flux Isotope Reactor (HFIR) at the Oak Ridge National Laboratory (http://neutrons.ornl.gov/), the Institut Laue-Langevin (ILL, https://www.ill.eu/), the China Spallation Neutron Source (CSNS, http://english.ihep.cas.cn/csns/), the European Spallation Source ERIC (ESS, http://europeanspallationsource.se/), the Forschungs-Neutronenquelle Heinz Maier-Leibnitz (FRM II, https://www.frm2.tum.de/en/frm2/home/), and SINQ at the Paul Scherrer Institute ( https://www.psi.ch/en/sinq). Scipp is used at the ESS. All experimental techniques, including powder-diffraction, small-angle scattering, reflectometry, single-crystal diffraction, spectroscopy, and imaging can be affected. Many data-reduction workflows are affected multiple times. For example, in powder diffraction we may perform normalization of sample data to monitor (problem from broadcast over pixels and event-mode), normalization of vanadium data to monitor (problem from broadcast over pixels and event-mode), normalization of reduced data to vanadium (problem from event-mode). Even this relatively simple example includes five distinct instances of the problem.

To our knowledge little or no impact of the issue described in this manuscript has been reported in practice. There are at least two likely explanations. Firstly, variations of Eq. (13) may hold in many cases. That is, in practice the normalization terms often have very good statistics – as we have seen for the concrete SANS and powder diffraction examples in Section 3.1 and Section 3.2, respectively. This can be both due to practicality (it is relatively simple to measure many vanadium counts, or have a monitor with a high count rate) and due experience (instrument scientists know how long to measure normalization terms for good results). Secondly, coordinate transformations, as well as preprocessing steps of the normalization terms such as smoothing, lead to correlations that spread over a wider range. These correlations might affect mostly non-local aspects, which could be harder to discern when, e.g., fitting a physical model to the reduced result.20 20

In our opinion the above can however not justify dismissal of the issue. The current state-of-the-art software gives a false sense of security (of having handled uncertainties correctly) to the scientist. While we managed to compute corrected analytical results for variances and covariances in the model workflows used in this manuscript, this cannot be directly substituted in CS implementations for the operation-by-operation error-propagation mechanism. At the very least our software must thus avoid giving a false sense of security and should therefore reject normalization terms that have uncertainties. This will force the scientist to consider whether the normalization uncertainties fulfill Eq. (13). Based on this they can then either (A) confidently drop this term or (B) increase the statistics by measuring the normalization terms better.

A complete analysis for any particular technique such as SANS or powder-diffraction is beyond the scope of this manuscript. The workflows discussed in Section 3 are therefore not necessarily capturing all potential sources of uncertainties, but are meant to illustrate the key aspects. To give an idea of what a more complete analysis might involve, consider the SANS case. Our analysis in Section 3.1 covered the effects of the variances of the monitors. There are however a number of other contributions: (1) Before computation of wavelengths and Q, the beam center is determined. The uncertainty on the beam center leads to a correlation of all computed wavelength and Q values. (2) For the computation of the transmission fraction, each monitor is background-subtracted: The monitor background is computed as the mean over a user-defined wavelength range and then subtracted from the monitor. This broadcasts the mean value to all monitor bins, leading to correlations. (3) The direct beam function D itself comes with uncertainties which were computed using CS and may thus be underestimated in themselves.

Intriguingly, the SANS data reduction implemented in Scipp prior to this work (and used for this analysis) actually ignored the uncertainties of D. This provides a worrysome glimpse of the potentially cascading negative effects of the CS implementation: An interpolation is required to map D to the wavelength-binning of the monitors, and it was unclear how uncertainties should be handled during this interpolation. During development of the workflow it was determined empirically that setting D’s uncertainties to zero had no influence on the uncertainties of the final result. Looking back, we can now tell that this empiric evidence was based on the incorrect handling of uncertainties – since the broadcast of the pixel-independent D to all pixels supresses its contribution to the final uncertainties. Thus the decision to ignore them needs to be revisited. Preliminary results indicate that correct handling of the uncertainties of D may noticably impact the final uncertainties.

4.2.Improved analytical approach

Aside from highlighting the systematic problem, we have succeeded in Section 3 in deriving expressions for variance-covariance matrices of the data-reduction results. As we will show in the discussion in Section 4.3, there is likely no major difference in terms of computational cost compared to the flawed approach currently implemented in Mantid and Scipp, even though it provides access to not just the variances but also the covariances of the final result. We conjecture that our approach may be feasible also for complete and realistic data-reduction workflows. We plan to investigate this in more detail in future work, in particular to extend this to other neutron-scattering techniques and to include corrections such as absorption corrections and background subtractions.

A fundamental difference – and possibly the main challenge – is that unlike in the CS implementation, handling and computation of variances is difficult to automate. While Python packages such as JAX [4] and Uncertainties [16] implement automatic differentiation approaches, it is unlikely that this is directly applicable for the components of a neutron-scattering data-reduction workflow.21 21 That is, we would need to either rely on fixed workflows with previously derived expressions for the variance-covariance matrix, or the user must derive these on their own. If a user does derive such an expression, programming skills are required to implement it. In either case, workflows that handle error propagation may become less interactive.

On the positive side, our new approach can produce data-reduction results including covariances. This may be beneficial for subsequent analysis and model fitting procedures. Furthermore, Scipp has proven flexible enough to allow for computation of the expressions for the variance-covariance matrix with very concise code and adequate performance. This further increases our confidence that the described approach may be a viable solution for handling data from day-to-day instrument operations.

4.3.Computational cost of the improved analytical approach

In Sections 3.1 and 3.2 we described an improved analytical approach for computing uncertainties and correlations for the result of the data-reduction. The computation of the variance-covariance matrices turns out to be not significantly more expensive than the computation of the cross sections. Despite yielding the full correlation matrix the overall cost is thus comparable to that of the CS implementation. This holds true even for event-mode data-reduction. To understand this, one important aspect to realize is that – in contrast to CS – it is no longer necessary to carry the uncertainties of the detector counts in intermediate steps, since for the raw input data the variances are equal to the measured counts. A full analysis of the performance characteristics is beyond the scope of this paper. We briefly touch on the main aspects and proceed to report concrete timings for an example below.

Firstly, considering the normalization operation we observe the following. In addition to the normalized counts, x/y, CS computes the variance as

(43)var(xy)=var(x)+var(y)x2y2y2.

This is reordered compared to Eq. (4) to avoid infinity in limit cases such as x=0. These operations for the variances are omitted for the improved approach. While this saves a lot of floating-point operations, in practice the normalization is typically bound by memory bandwidth on modern computer architectures. Cutting the required memory in half by not storing variances at this point thus has likely a more significant impact. Secondly, there are in principle savings in both floating-point operations as well as memory bandwidth from not having to handle variances during the histogramming step. In practice however, histogramming is strongly influenced by CPU cache effects and various operations with non-linear scaling in either the number of events or the number of bins. Details depend heavily on the concrete implementation. See the excellent Ref. [8] for more background on CPU architecture, memory, and their influence on performance. These saving are counterbalanced by additional computations for, e.g., Eq. (35). This involves computation of the Hdc factors, which are obtained by computing a 2-D (d,λ)-histogram. The sum over the product of resulting factors then corresponds to a medium-sized matrix-matrix product, which takes a couple of seconds for realistic wavelength and d-spacing bin counts.22 22

We illustrate this on the example of the Wish powder-diffraction data-reduction. The input data has a file size of 930MByte with 194560 detector pixels. In Table 1 we report timings for the relevant part of data-reduction: normalization, computation of d-spacing, and reduction (histogramming) using the multi-threaded Scipp implementation running on all 24 hardware threads of the test system. Times for loading data and computation of λ are not included. We observe that our improved approach – which includes computation of the variance-covariance matrix – is less than twice as costly as the CS approach, taking between 2.3s and 10.9s depending on the choice of parameters. The additional cost ranges between 0.5s and 4.6s For comparison, note that loading a Wish raw file using Mantid takes about 12s on the same machine. That is, the overall impact on data-reduction performance is likely minor. The impact could be reduced further by optimizing the code. For example, Scipp does currently not have a direct 2-D histogramming algorithm but instead used a less efficient two-step approach.

Table 1

Timings for the relevant part of the Wish histogram-mode data-reduction. Nλ is the number of wavelength bins and Nd the number of d-spacing bins. Both columns reports timings for a Scipp-based implementation, but the mechanism for propagating uncertainties is used only for the CS column. Measured on an Intel Xeon W-2265, 19.25MByte cache, 3.50GHz, 12 cores (24 threads). Reported values are rounded and averaged between three runs

Nλ, Nd	CS [s]	ours [s]
1000	1.8	2.3
2000	3.3	4.5
4000	6.3	10.9

5.Conclusion

In this paper we have shown a systematic problem with the treatment of the statistical uncertainties of normalizations terms in Mantid and Scipp. We have shared our findings with the Mantid development team. In the Scipp project, we have disabled propagation of uncertainties if one of the operands is broadcast in an operation with the release of Scipp v23.01. Support for event-mode normalization operations with normalization terms that have variances is also disabled. In both cases an exception is raised instead, i.e., the operation will fail, forcing the user to address the problem correctly. We had initially considered simply displaying a warning to the user when an operation introduces correlations. However, experience shows that warnings are almost always ignored and we now do not consider such an approach as a solution.

Aside from highlighting the systematic problem, we have succeeded in deriving expressions for variance-covariance matrices of the data-reduction results. Computationally the new approach would be feasible, but whether it is applicable and useful in practice remains to be investigated.

Notes

1 Disclaimer: The authors of the present paper are the principal authors of Scipp and have previously contributed to Mantid for several years.

2 The normalization term could be, e.g., a neutron monitor spectrum or a reduced vanadium spectrum.

3 Whether this assumption is always correct is not the topic of this manuscript.

4 Here we chose to compute the same result in two steps using correlations of summands since it aids in highlighting the difference to the CS approach. A direct and likely simpler way to compute this is using Σf=JΣabJT=J·diag(var(a1),…,var(an),var(b))·JT with Jacobian matrix J=[∂f∂a1…∂f∂an∂f∂b]=[1b…1b−1b2∑iai]. The diagonal of Σf yields the variances var(f(a,b)).

5 Multiplication ab: 3 multiplications (b without variance) vs. 5 multiplications and 1 addition (b with variance); Division a/b: 2 divisions and 1 multiplication (b without variance) vs. 3 divisions, 3 multiplications, and 1 addition (b with variance); Numbers may vary based on implementation and when standard-deviations instead of variances are stored, as is the case in parts of Mantid.

6 In Mantid such coordinate transformations are referred to as unit conversion.

7 The direct-beam function allows to cross-normalize the incident spectrum to that of the empty beam (without sample) seen on the main detector.

8 Here we assumed quasi-elastic scattering. A correction factor for inelastic-scattering [14] would depend on i, j, and λ.

9 The data has been published in [17], where all the details on the experimental setup can be found.

10 The monitors from the direct run, Mdirectincident and Mdirecttransmission also play a role, but we consider them as a given invariant for this analysis.

11 For related methods such as Markov-chain Monte-Carlo uncertainty estimates see, e.g., Ref. [2] and [18].

12 Including background subtraction in our bootstrap calculations, i.e., computing P(Q) instead of just S(Q), yielded very similar results for the variances as well as the correlations.

13 We use bins of constant width to simplify the notation, but the following results are equally valid for other choices of bin widths.

14 Note that in software histogramming is not actually implemented via a matrix multiplication as this would be highly inefficient.

15 For example, scipy.optmize.curve_fit accepts the covariance matrix as input via the sigma parameter. A full discussion for other approaches such as Bayesian statistics is beyond the scope of this paper.

16 The sample is NaCaAlF measured in Wish run number 52889 and the vanadium was measured in run number 52920.

17 We use bins of constant width for simplifying the notation, but the following results are also valid for other choices of bin widths.

18 For histogrammed data the expression may need to be adapted depending on the rebinning approach that is being used.

19 Note that, e.g., Mantid’s FFTSmooth (version 2) algorithm sets uncertainties directly to 0. This behavior appears to be undocumented.

20 We do not have enough background knowledge about concrete subsequent data analysis methods to tell when and where such non-local properties might be irrelevant.

21 Even if automatic differentiation could handle operations such as histogramming, the Uncertainties Python package stores arrays with uncertainties as arrays of values with uncertainties. The very high memory and compute overhead makes this unsuitable for our needs.

22 Timings measured for NumPy [10] backed by a multi-threaded linear-algebra library.

23 This assumption also underlies the regular propagation of uncertainties.

Acknowledgements

We kindly thank the authors of Ref. [17] for allowing us to use their data for the SANS example. We thank Pascal Manuel for sharing Wish data, used in our powder-diffraction example. We thank Evan Berkowitz and Thomas Luu for valuable discussions about bootstrap and correlations. We thank our coworkers Celine Durniak, Torben Nielsen, and Wojciech Potrzebowski for helpful discussions and comments that helped to improve the clarity of the manuscript.

Appendices

Appendix A.

Appendix A.Bootstrap

This section gives a brief overview of bootstrapping as used in this work. For a more complete explanation we refer to Ref. [9]. Given a measured distribution of random variables X, we aim to compute some parameter θ. We estimate θ from a sample X using an estimator s, θˆ=s(X).

In this work, X typically corresponds to a neutron count rate per pixel, per wavelength bin, as well as other input parameters. That is, X includes, e.g., both measurements at detectors D and monitors M such that, e.g., X≡[Xi1,…,Xin]=[Dλ1p1,…,Dλmpl,Mλ1,…,Mλm]. D is indexed by wavelength λ and pixel p while M is only indexed by the wavelength. θ can be the scattering intensity as a function of d-spacing or Q. The estimator s corresponds to the data-reduction procedure that yields the intensity for that d or Q.

Fig. 8.

Convergence of bootstrap to the analytical result for the relative standard deviation for SANS S(Q). The bottom panels shows the relative difference between each bootstrap result and the analytical result. R is the number of bootstrap replicas.

We assume all Xi to be independently normally distributed23 23 with mean μi and standard deviation σi, i.e. Xi=N(μi,σi). Bootstrapping proceeds by sampling from Xi such that a single bootstrap sample is

(44)x∗=[xi1∗,…,xin∗],where xi∗∼Xi=N(μi,σi).

The desired parameter is then computed on this sample, yielding a bootstrap replica θˆ∗=s(x∗). Finally, the average and variance of θˆ are computed from R replicas as

(45)μ(θˆ∗)=1R∑r=1Rs(xr∗),(46)var(θˆ∗)=1R−1∑r=1R(s(xr∗)−μ(θˆ∗))2.

In addition, the covariance of two parameters cov(θ,η) can similarly be estimated as

(47)cov(θˆ∗,ηˆ∗)=1R−1∑r=1R(sθ(xr∗)−μ(θˆ∗))(sη(xr∗)−μ(ηˆ∗)).

As an example, Figs 8 and 9 show the relative standard deviation of S(Q) (SANS) and S(d) (powder diffraction) computed by bootstrap with different values of R. The figures demonstrate that the bootstrap calculations converge to the analytical result. Our bootstrap code is available as a Python package at https://scipp.github.io/scippuncertainty/.

Fig. 9.

Convergence of bootstrap to the analytical result for the relative standard deviation for powder diffraction S(d). The bottom panels shows the relative difference between each bootstrap result and the analytical result. R is the number of bootstrap replicas.

$Convergence of bootstrap to the analytical result for the relative standard deviation for powder diffraction S(d). The bottom panels shows the relative difference between each bootstrap result and the analytical result. R is the number of bootstrap replicas.$

Appendix B.

Appendix B.Figures for correlation matrices

For completeness, we show in this section the figures of the correlation matrices from the SANS and powder diffraction workflows using a logarithmic color scale.

Fig. 10.

Fig. 11.

References

[1]	O. Arnold, J.C. Bilheux, J.M. Borreguero, A. Buts, S.I. Campbell, L. Chapon, M. Doucet, N. Draper, R.F. Leal, M.A. Gigg, V.E. Lynch, A. Markvardsen, D.J. Mikkelson, R.L. Mikkelson, R. Miller, K. Palmen, P. Parker, G. Passos, T.G. Perring, P.F. Peterson, S. Ren, M.A. Reuter, A.T. Savici, J.W. Taylor, R.J. Taylor, R. Tolchenov, W. Zhou and J. Zikovsky, Mantid – data analysis and visualization package for neutron scattering and μSR experiments, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 764: ((2014) ), 156–166. doi:10.1016/j.nima.2014.07.029.
[2]	B.A. Berg, Introduction to Markov Chain Monte Carlo Simulations and Their Statistical Analysis, (2004) . doi:10.48550/ARXIV.COND-MAT/0410490.
[3]	P.R. Bevington and D.K. Robinson, Data Reduction and Error Analysis for the Physical Sciences, McGraw-Hill, Boston, (2003) .
[4]	J. Bradbury, R. Frostig, P. Hawkins, M.J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne and Q. Zhang, JAX: Composable transformations of Python+NumPy programs, 2018, http://github.com/google/jax.
[5]	Broadcasting in NumPy, https://numpy.org/doc/stable/user/basics.broadcasting.html.
[6]	L. Chapon, P. Manuel, P. Radaelli, C. Benson, L. Perrott, S. Ansell, N. Rhodes, D. Raspino, D.M. Duxbury, E. Spill and J. Norris, Wish: The new powder and single crystal magnetic diffractometer on the second target station, Neutron News 22: ((2011) ), 22–25. doi:10.1080/10448632.2011.569650.
[7]	Documentation of error propagation in Mantid, 2022, https://docs.mantidproject.org/nightly/concepts/ErrorPropagation.html.
[8]	U. Drepper, What Every Programmer Should Know About Memory, 2007, https://people.freebsd.org/~lstewart/articles/cpumemory.pdf.
[9]	B. Efron and R.J. Tibshirani, An Introduction to the Bootstrap, Chapman and Hall/CRC, (1994) . doi:10.1201/9780429246593.
[10]	C.R. Harris, K.J. Millman, S.J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N.J. Smith, R. Kern, M. Picus, S. Hoyer, M.H. van Kerkwijk, M. Brett, A. Haldane, J.F. del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke and T.E. Oliphant, Array programming with NumPy, Nature 585: (7825) ((2020) ), 357–362. doi:10.1038/s41586-020-2649-2.
[11]	R.K. Heenan, J. Penfold and S.M. King, SANS at pulsed neutron sources: Present and future prospects, Journal of Applied Crystallography 30: (6) ((1997) ), 1140–1147. doi:10.1107/S0021889897002173.
[12]	S. Heybrock, O. Arnold, I. Gudich, D. Nixon and N. Vaytet, Scipp: Scientific data handling with labeled multi-dimensional arrays for C++ and Python, Journal of Neutron Research 22: (2–3) ((2020) ), 169–181. doi:10.3233/JNR-190131.
[13]	S. Heybrock, N. Vaytet and J.-L. Wynen, Systematic underestimation of uncertainties by widespread neutron-scattering data-reduction software – Supplementary Material, Zenodo, 2023. doi:10.5281/zenodo.7708115.
[14]	R. Hjelm, The resolution of TOF low-Q diffractometers: Instrumental, data acquisition and reduction factors, Journal of Applied Crystallography 21: ((1988) ), 618–628. doi:10.1107/S0021889888005229.
[15]	ISIS Powder Diffraction Scripts Workflow Diagrams, https://docs.mantidproject.org/v6.5.0/techniques/ISISPowder-Workflows-v1.html.
[16]	O.E. Lebigot, Uncertainties: A Python package for calculations with uncertainties, 2023, http://pythonhosted.org/uncertainties/.
[17]	I. Manasi, M.R. Andalibi, R.S. Atri, J. Hooton, S.E.M. King and K.J. Edler, Self-assembly of ionic and non-ionic surfactants in type IV cerium nitrate and urea based deep eutectic solvent, The Journal of Chemical Physics 155: (8) ((2021) ), 084902. doi:10.1063/5.0059238.
[18]	C.E. Papadopoulos and H. Yeung, Uncertainty estimation and Monte Carlo simulation method, Flow Measurement and Instrumentation 12: (4) ((2001) ), 291–298. doi:10.1016/S0955-5986(01)00015-2.
[19]	P.F. Peterson, S.I. Campbell, M.A. Reuter, R.J. Taylor and J. Zikovsky, Event-based processing of neutron scattering data, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 803: ((2015) ), 24–28. doi:10.1016/j.nima.2015.09.016.
[20]	P.F. Peterson, D. Olds, A.T. Savici and W. Zhou, Advances in utilizing event based data structures for neutron scattering experiments, Review of Scientific Instruments 89: (9) ((2018) ), 093001. doi:10.1063/1.5034782.
[21]	Sans2d: I(Q) workflow for a single run (sample), https://scipp.github.io/ess/instruments/loki/sans2d_to_I_of_Q.html.
[22]	Sans2d: Time-of-flight Small-Angle Neutron Scattering instrument, https://www.isis.stfc.ac.uk/Pages/Sans2d.aspx.
[23]	Scipp Documentation, 2022, https://scipp.github.io/.
[24]	SNSPowderReduction algorithm documentation, https://docs.mantidproject.org/v6.5.0/algorithms/SNSPowderReduction-v1.html.