You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.
Go to headerGo to navigationGo to searchGo to contentsGo to footer
In content section. Select this link to jump to navigation

A systematic global assessment of the completeness and quality of household death reporting in censuses and surveys since 2000

Abstract

Many censuses and surveys in low- and middle-income countries ask questions about deaths in the household to fill the evidence gap about mortality. This study undertakes the first published systematic assessment of the completeness and quality of these data. For 82 censuses from 56 countries and 26 surveys from 21 countries since 2000 we calculated completeness of household death reporting using deaths estimated by the United Nations World Population Prospects (UN WPP) and Global Burden of Disease (GBD) as the denominator. The median completeness of reported household deaths in censuses was 89% (inter-quartile range (IQR) 66–102%) and surveys 96% (IQR 80–124%). Completeness was similar for males and females and substantially lower where date of death was asked (census median 73%, IQR 53–91%) than not asked (census median 93%; IQR 74–110%); these differences remained after controlling for other covariates in a linear regression. The ratio of reported household to estimated deaths was higher in younger ages but age-invariant where date of death was asked. In conclusion, household death data in censuses and surveys have major completeness and quality issues. Where date of death was not asked, there appears to be considerable reporting of deaths that occurred outside of the reference period.

1.Introduction

Reliable routine all-cause mortality data disaggregated by age and sex are a fundamental cornerstone of evidence to inform population health monitoring and policy. These data are used to calculate several important population health indicators, such as adult mortality probabilities, life expectancy and years of life lost, that help understand mortality and cause of death patterns in a population, to track progress to national and international goals, and to provide evidence of the mortality impact of pandemics and natural disasters. The optimal source of such data should be a high quality civil registration and vital statistics (CRVS) system, that registers all (or almost all) deaths in a population and compiles these data to produce timely mortality statistics [1]. However, the COVID-19 pandemic demonstrated that many countries’ governments do not have CRVS data of sufficient quality and timeliness to measure excess mortality. In March 2022, there was no routine mortality data available for 2020 or 2021 in 84 of 194 countries, while only 73 countries had full national data for that period [2].

While a longer-term goal to improve mortality data should be to strengthen CRVS systems, which is the objective of multiple international projects, attaining complete death registration could take years or even decades to occur for many countries [3]. In the interim, many countries have sought to measure all-cause mortality by using population censuses or household surveys that ask respondents to report on the deaths that occurred in their household, as well as sex and age at death of the deceased. Censuses commonly collect mortality data not only of household deaths but also parental survival (orphanhood), summary birth histories (i.e. number of children ever born and children still living), and additional questions about the timing of female death in reproductive age relative to pregnancy or birth to ascertain pregnancy-related deaths. Surveys also collect more detailed birth history data, sibling survival data and also ask further questions related to the timing of the death from pregnancy or birth.

Questions about deaths in the household have been included in progressively more censuses over recent decades, from eight countries in Africa in the 1970 census round to 76 countries globally in the 2010 round [4, 5]. Beginning with the United Nations’ (UN) Principles and Recommendations for Population and Housing Censuses Revision 2 that was published in 2008, household deaths in the previous 12 months or other period before the census has been a core topic in this guidance document [4, 6, 7]. The UN recommend that a question asks the number of deaths in the household in the past 12 months (or other time period related to a festive or historic date in the country), with additional questions of the deceased’s name, sex, age at death and date of death (day, month, year) [7]. This information is provided by the head of the household or household reference person, who is commonly an older male. A recent review found that 76 of the 195 countries that conducted censuses in the 2010 round had included the household deaths questions, mostly in sub-Saharan Africa and Latin America and the Caribbean [5]. That review found that all of the 76 censuses that asked the sex and age of the deceased (with some also asking for date of birth of the deceased), 65 censuses used a 12-month reference period, but only 26 censuses used the date of death question. One issue with having these recommended questions included in censuses is that many countries are reluctant to change questionnaires or want to ensure the least number of questions are included. Eighteen of 55 countries that have completed 2020 round censuses have included the household deaths questions [5].

Despite household deaths questions being used in many censuses and surveys throughout the world, little is known about the quality of data they collect, in terms of completeness at all ages but also by specific age group and sex. There are concerns about under-reporting of deaths for various reasons, including sensitivities that make respondents reluctant to report deaths, poorer recall of deaths the longer the time since occurrence, misreporting of age at death, non-representation of deaths of people in single-person households and in institutional settings, and that some households dissolve after a death due to disputes over inheritance or because of sudden reduction in income [5, 8, 9, 10]. Consistency and clarity of the definition of a household is also an issue, otherwise there can be confusion about whether migrants or extended family, should be included, which can result in over- or under-reporting of deaths [10]. There are inconsistencies between countries in how a household is defined because of challenges in implementing the UN’s definition of a household where it may not accommodate different living arrangements found in much of the world [11]. For example, the definition of a household differs between Uganda (“live together (house or compound) and eat together”), Tanzania (“live together and share living expenses”) and Senegal “live together under the same roof, pool resources, eat together, and under one household head” [11]. Accurate inclusion of deaths within the reference period is also important, otherwise the issue of “telescoping” of deaths outside the reference period can occur if the date of death is not asked or reported incorrectly [5, 8]. For surveys, the accuracy of mortality statistics is affected by sampling uncertainty, especially at ages with relatively lower risk of mortality and for subnational measurement; this can be overcome with increasing the reference period, although accuracy of death reporting for longer recall periods may decline. Further, clustering of deaths can also adversely affect sampling uncertainty, especially in emergencies [12]. As with any census or survey, adequate training of enumerators is important to improve the accuracy of data collected.

Studies that have assessed the completeness of household deaths data in censuses and surveys have revealed varied findings. Some studies have compared census deaths to Health and Demographic Surveillance System (HDSS) sites that collect high quality deaths data. Analysis of 2006 Burkina Faso census household deaths data linked to deaths in Nouna HDSS showed that census deaths were 21% lower for males and 32% lower for females than in the HDSS, resulting in census life expectancies that were four years higher for males and eight years higher for females; census deaths were 40–50% lower at ages 75 years and above and over 50% lower for female infants [8]. Comparison of 2002 and 2013 Senegal census data with HDSS data however found that resultant life expectancies at birth were broadly similar between each data source type [13]. Similarly, comparison of crude death rates in 2010 Ghana census data with a HDSS found they were almost identical, however age-specific death rates in the census above age 65 years were significantly lower than in the HDSS [10]. Analysis of the completeness of female deaths at ages 15–49 years for two censuses in nine countries in sub-Saharan Africa, South Asia and Southeast Asia, calculated as an average of three death distribution methods, showed low completeness, ranging from 15% in Cambodia to 81% in Zambia, with average completeness just over 50% [4]. An assessment of the 2007 Survey of Population Change in Vietnam using the Preston-Coale death distribution method estimated completeness of household death reporting to be 69% for males and 54% for females, while the 2015–16 CRVS Survey in Nepal revealed completeness of household death reporting to be 75% in 2015 but just 54% for the earlier period of 2014 [14, 15].

Even where household deaths data reported in censuses and surveys are incomplete, they have been widely used to estimate all-cause mortality by age and sex. Large global mortality estimation studies have used these data, adjusted for completeness measured by death distribution methods, as one source to estimate adult mortality (15–59 years) which is then input into model life tables to estimate complete life tables [16, 17]. Individual countries have also used a similar approach to estimate life tables based on a census [18]. Death distribution methods measure completeness of death reporting based on the internal consistency of the age pattern of the population and the age pattern of reported deaths, as well as by making assumptions about population dynamics, such as it being closed to migration [19]. The most widely used of these methods – the Generalised Growth Balance (GGB) method, the Bennett-Horiuchi or Synthetic Exteinct Generations (SEG) method and a hybrid of these two methods (GGB-SEG) – measure completeness of death reporting of in an intercensal period (or across two censuses) using population data from two censuses [20, 21].

However, a concern with death distribution methods is that they are inaccurate at measuring completeness because they are based on assumptions of population dynamics (e.g. closed to migration) which may not be applicable to contemporary populations [19, 22]. Although age trims that restrict the age range used to calculate completeness are commonly used to reduce the impact of breaches of the assumption of no migration on the accuracy of completeness estimates, a simulation study found that even with the use of age trims the 95% uncertainty intervals of completeness from these methods are approximately one-quarter of the estimate [22]. Any inaccuracy in the estimate of completeness would adversely affect the reliability of the adjusted mortality statistics. Hence, household deaths data would be of most use for producing mortality statistics if they are complete, or close to complete. Furthermore, for reported household deaths in a census, the intercensal death distribution methods can only be applied to countries with available data for reported household deaths from two censuses.

There has been renewed interest in and advocacy for the use of household deaths data to fill the evidence gap for mortality from the COVID-19 pandemic [23]. In India, a household deaths question was included in a phone survey which was used with other sources to estimate over three million excess deaths from June 2020–July 2021, the highest in the world and 6–7 times higher than official figures [24]. Furthermore, given that many countries collect household deaths data in censuses or surveys, and will continue to do so in future, it is important to conduct a systematic assessment of the completeness and quality of these data to assess their utility as sources of mortality statistics and to potentially inform efforts to improve question design and implementation. No previous studies, to our knowledge, have conducted a comprehensive assessment of these data, which is surprising given household deaths questions are a topic recommended for inclusion in censuses by the UN.

This study hence undertakes a systematic assessment of the completeness and quality of household death reporting in censuses and surveys since 2000 using available data. Completeness is assessed against both United Nations World Population Prospects (UN WPP) and GBD estimates of total deaths by sex. The analysis also compares completeness based on whether the date of death was asked, to assess the impact of reporting of deaths outside the stated reference period. The quality of household death reporting is assessed using the age-specific ratio of reported to estimated total deaths for that age group. The findings of the study will fill a large knowledge gap about the completeness of household death reporting in censuses and surveys.

2.Methods

This study analyses reporting of deaths by households in population censuses and surveys from 2000 to 2021. Reporting of deaths by households refers to where questions were asked of the respondent to report the number of deaths in their household within a defined period of time (mostly 12 months before the census or survey), as well as the sex, age at death and (but not necessarily including) date of death of the deceased. Data were obtained from searching for censuses and surveys where household deaths questions were known to have been included in the questionnaire. Individual country census and survey reports, the IPUMS International database, United Nations Statistics Division (UNSD) Demographic Statistics database, GBD deaths database and Demographic and Health Surveys (DHS) data were searched to find reported household deaths data; country reports were selected instead of other sources if both were available [25, 26, 27, 28]. We extracted data on the number of reported household deaths by (where available) sex and five-year age group. Where the reference period for deaths was greater than 12 months, we extracted data of deaths for the most recent 12-month period if possible.

Data for a total of 82 censuses from 56 countries and 26 surveys from 21 countries were compiled (Supplementary information, Table S1). Over half (44) of the 82 censuses and also over half (15) of the 26 surveys were conducted in countries in the sub-Saharan African super-region (as classified by the GBD), followed by Southeast/ East/ Central Asia, Oceania (19 censuses, 3 surveys) and Latin America and Caribbean (11 censuses, 3 surveys). Data were available by sex in 68 censuses and 23 surveys and by age in 66 censuses and 22 surveys. Date of death was only asked in 27 censuses, was not asked in 45 censuses and for 10 censuses it was unclear because the questionnaire was not available. Date of death was asked in 9 surveys, was not asked in 14 surveys and for 3 surveys it was unclear because the questionnaire was not available. Data were able to be extracted for the most recent 12 months or less in 78 of the 82 censuses and 20 of the 26 surveys. The full list of censuses and surveys, including sources, is shown in Supplementary information, Table S2.

We calculated completeness of household death reporting as the number of reported household deaths divided by the number of deaths for the same reference period in that country as estimated by the UN World Population Prospects (WPP) and GBD (based at the Institute of Health Metrics and Evaluation, University of Washington) [17, 29]. Both the UN WPP and GBD make estimates of total deaths for all countries and territories using a standardised methodology. They each estimate total deaths by firstly estimating under-five mortality (5q0) and adult mortality (45q15 or the probability of dying from 15 to 60 years). Estimates of under-five mortality are made from registration data and summary and complete birth histories in censuses and surveys. In the GBD, annual under-five mortality estimates are generated using spatio-temporal Gaussian process regression that corrects for source-specific bias [29]. For adult mortality, reported household deaths data from censuses and surveys, adjusted for estimated incompleteness using death distribution methods, are used along with (but not always) death registration adjusted for incompleteness using death distribution methods, sibling survival data from censuses and surveys and health and demographic surveillance system (HDSS) site data. In the GBD, spatio-temporal Gaussian process regression is then used to estimate annual adult mortality using socio-economic and regional covariates [29]. Both the UN WPP and GBD input final under-five and adult mortality estimates into a model life table to calculate age-specific mortality rates.

Completeness was also calculated for each sex. Our focus was on completeness when calculated with UN estimated deaths as the denominator, with the GBD used mainly for comparative purposes. For surveys, we calculated the household death rate and calculated completeness compared to the UN’s or GBD’s estimated death rate. We also assessed quality of data calculated as the ratio of reported household deaths to either UN or GBD estimated total deaths by five-year age group (0–4 years to 80+ years); this is labelled as a ratio rather than as completeness because there is considerable uncertainty in the estimates of age-specific deaths according to both the UN and GBD.

Estimated deaths were calculated for the reference period by weighting annual death estimates by the proportion of the year which was covered by the period. The number of household deaths reported by censuses may be under-reported because of an undercount of the population by the Census. To overcome this issue, we calculated completeness adjusted for the size of the population counted in the Census relative to the population interpolated to the Census date according to population estimates (either UN or GBD estimated population was used depending on which was the source of estimated deaths) [16, 17, 28]. Completeness of reported household deaths data may be biased if the same data was used as an input to estimate adult mortality as part of the estimated deaths analysis of the UN and GBD. For UN death estimation, 18 of 82 censuses and eight of 24 surveys were used as an input in the total deaths estimation, while for GBD death estimation 23 of 82 censuses and eight of 24 surveys were used. We therefore analysed completeness separately for whether the data source was included in the total deaths estimation analysis.

Completeness and the age-specific ratio of reported household deaths was analysed using summary statistics of distribution: median, inter-quartile range (IQR: 25th to 75th percentile), minimum and maximum. This was examined separately for each sex and age. For the analysis by sex, we calculated the summary statistics (across all censuses/surveys) of male and female completeness separately, as well as summary statistics of the census/survey-specific difference in male and female completeness; statistically significant differences in the latter were assessed using the Wilcoxon signed rank test [30]. Completeness was also analysed by whether date of death was asked in the questionnaire to assess the impact of the use of this recommended question. The coefficient of variation (standard deviation divided by the mean) was measured to account for there being fewer censuses and surveys where the date of death was asked compared with it not being asked.

To disentangle the association of completeness with a range of factors, we conducted the following linear regressions for completeness for both sexes:

𝑟𝑎𝑡𝑖𝑜jk=β0+β1𝑆𝐷𝐼jk+β2𝑆𝑅j+β3𝑠𝑜𝑢𝑟𝑐𝑒jk
(1)
+β4𝑑𝑎𝑡𝑒jk+β5𝑖𝑛𝑐𝑙jk+β6k+ejk
𝑟𝑎𝑡𝑖𝑜jk=β0+β1𝑆𝐷𝐼jk+β2SRj+β3𝑠𝑜𝑢𝑟𝑐𝑒jk
+β4𝑑𝑎𝑡𝑒jk+β5𝑠𝑜𝑢𝑟𝑐𝑒jk𝑑𝑎𝑡𝑒jk
(2)
+β6𝑖𝑛𝑐𝑙jk+β7k+ejk

We also conducted the following regressions for completeness for each sex:

𝑟𝑎𝑡𝑖𝑜jkx=β0+β1𝑆𝐷𝐼jk+β2𝑆𝑅j+β3x
+β4𝑠𝑜𝑢𝑟𝑐𝑒jk+β5𝑑𝑎𝑡𝑒jk+β6𝑖𝑛𝑐𝑙jk
(3)
+β7k+ejk
𝑟𝑎𝑡𝑖𝑜jkx=β0+β1𝑆𝐷𝐼jk+β2𝑆𝑅j+β3x
+β4𝑠𝑜𝑢𝑟𝑐𝑒jk+β5𝑑𝑎𝑡𝑒jk+β6𝑠𝑜𝑢𝑟𝑐𝑒jk
(4)
𝑑𝑎𝑡𝑒jk+β7𝑖𝑛𝑐𝑙𝑗𝑘+β8k+ejk

where completeness is completeness of household deaths (using either UN or GBD estimated deaths), SDI is the Socio-Demographic Index, SR is the super-region, source is the data source (census or survey), date is whether the date of death was asked, incl is whether the data source was included in the estimated deaths analysis, k is year, x is sex, e is an error term, j is country, and β0 to β8 are coefficients. The SDI is a composite measure of income, education and fertility; it is the average of rankings of lag distributed income per capita, mean education of people aged 15 years and above, and total fertility rate under the age of 25 years, ranging from 0 (minimum level of development) to 1 (maximum level of development) [31]. In models 2 and 4 we included an interaction between the covariates of data source type and whether date of death was asked. The regressions were conducted separately with completeness calculated using UN estimated deaths and GBD estimated deaths. Standard errors were adjusted for clustering within countries. The regression did not include Sao Tome and Principe 2012 Census because the SDI was not available for that country. These analyses were conducted using Stata/SE 16.0 [32].

Figure 1.

Box plots of completeness (%) of reported household deaths (UN and GBD estimated deaths), by sex, censuses, 2000–2021. Box shows inter-quartile range (25th to 75th percentile), with middle horizontal line showing the median. Excludes outside values. In all box charts, adjacent lines show the highest value within the range p75 to p75 + 1.5 * IQR and the lowest value within the range p25 to p25 – 1.5 * IQR.

Box plots of completeness (%) of reported household deaths (UN and GBD estimated deaths), by sex, censuses, 2000–2021. Box shows inter-quartile range (25th to 75th percentile), with middle horizontal line showing the median. Excludes outside values. In all box charts, adjacent lines show the highest value within the range p75 to p75 + 1.5 * IQR and the lowest value within the range p25 to p25 – 1.5 * IQR.

3.Results

The median completeness of reported household deaths for both sexes for the 82 censuses was 89% (IQR 66–102%) compared with UN estimated deaths and 85% (64–113%) compared with GBD estimated deaths (Fig. 1, Supplementary information, Table S3). There was substantial variation in completeness for individual censuses, ranging from a minimum 21% (Burundi 2008) to a maximum 202% (Sudan 2008) for UN estimated deaths and from 22% (Burundi 2008) to 233% (Sudan 2008) for GBD estimated deaths (Table Supplementary information, A.4). Results for males and females were relatively similar. For the 68 censuses where sex-specific household deaths data were available, the median completeness for males (UN 88%, GBD 84%) and females (UN 86%, GBD 86%) were similar. The IQRs were also relatively similar to both sexes, being wider for females for GBD estimated deaths (60–115%). When calculated as the census-specific difference in male completeness and female completeness, the median difference for UN estimated deaths was three percentage points (p.p.) higher for males (statistically significant difference from zero), while for GBD estimated deaths it was two p.p. lower for males and not statistically significant (Supplementary information, Fig. S1). The lowest sex-specific completeness was 19% in Burundi 2008 (UN estimated deaths) for females and the highest 257% in Sudan 2008 (GBD estimated deaths) for females. When adjusting for the undercount in censuses, the median completeness for both sexes was slightly higher at 92% (IQR 67–106%) for UN estimated deaths and 86% (IQR 67–116%) for GBD estimated deaths (Supplementary information, Table S5).

Figure 2.

Box plots of completeness (%) of reported household deaths (UN and GBD estimated deaths), by sex, surveys, 2000–2016. Box shows inter-quartile range (25th to 75th percentile), with middle horizontal line showing the median. Excludes outside values.

Box plots of completeness (%) of reported household deaths (UN and GBD estimated deaths), by sex, surveys, 2000–2016. Box shows inter-quartile range (25th to 75th percentile), with middle horizontal line showing the median. Excludes outside values.

For surveys, the median completeness of reported household deaths for both sexes was slightly higher than for censuses, at 96% (IQR 80–124%) compared with UN estimated deaths and 101% (IQR 76–128%) compared with GBD estimated deaths (Fig. 2, Supplementary information, Table S3). Although the IQRs were wider than for censuses, the minimum (Botswana 2006 Demographic Survey: UN 51%, GBD 46%) and maximum (2010 Zambia Living Conditions Monitoring Survey: UN 180%, GBD 185%) completeness were not as extreme (Supplementary information, Table S4). The median completeness for males (99%) was higher than for females (91%) when compared with UN estimated deaths but was similar for GBD estimated deaths (males 103%, females 102%). The median survey-specific difference in male completeness and female completeness for UN estimated deaths was seven p.p. higher for males (statistically significant difference from zero), while for GBD estimated deaths it was four p.p. higher but not statistically significant (Supplementary information, Fig. S1).

Figure 3.

Box plots of completeness (%) of reported household deaths (UN estimated deaths), by whether date of death asked, censuses and surveys, 2000–2021. Box shows inter-quartile range (25th to 75th percentile), with middle horizontal line showing the median. Excludes outside values. Coefficient of variation. Census, date: 0.293. Census, no date: 0.383. Survey, date: 0.302. Survey, no date: 0.423.

Box plots of completeness (%) of reported household deaths (UN estimated deaths), by whether date of death asked, censuses and surveys, 2000–2021. Box shows inter-quartile range (25th to 75th percentile), with middle horizontal line showing the median. Excludes outside values. Coefficient of variation. Census, date: 0.293. Census, no date: 0.383. Survey, date: 0.302. Survey, no date: 0.423.

In censuses where the date of death was asked in the questionnaire, the completeness of reported household deaths was 20 p.p. lower (median 73% compared with UN estimated deaths; IQR 54–90%) than for censuses where the date of death was not asked (median 93%; IQR 74–110%) (Fig. 3, Supplementary information, Table S3). Notably, the median completeness where date of death was asked was almost identical to the 25th percentile of censuses for where date of death was not asked. The difference in median completeness was larger for females (date 70%, no date 91%) than for males (date 78%, no date 88%) (Supplementary information, Fig. S2). Similar results for each sex were found when compared with GBD deaths (Supplementary information, Fig. S3). Notably, the minimum and maximum completeness for censuses where the date of death was asked (34–108%) were much narrower than for those where it was not asked (21–202%); similar findings were revealed for males and females and also completeness calculated compared with GBD estimated deaths. These differences remain when the coefficient of variation is used to adjust for the wider range in completeness where the date of death was not asked. Even larger differences by whether date of death was asked were found for surveys, with a median of 85% (IQR 78–90%) compared with UN estimated deaths where date of death was asked and 122% (IQR 92–133%) where it was not asked, and again a much wider variation for where date of death was not asked (Fig. 3, Supplementary information, Fig. S4). Similar findings for completeness of surveys when compared with GBD estimated deaths are shown in Supplementary information, Fig. S5. Again, these findings remain when the difference is measured using the coefficient of variation.

A potential bias in the results is that the specific census or survey was a data source included in the analysis to estimate deaths by either the UN or GBD. For UN estimated deaths, the median completeness was almost the same for whether a census (included 88%, not included 90%) or survey (included 95%, not included 96%) was included in the estimation of deaths or not; IQRs were similar too (Supplementary information, Table S6). However, when completeness was calculated based on GBD estimated deaths, the median was much higher and the IQR narrower when the census (median: included 105%, not included 79%; IQR: included 81–115%, not included 62–106%) or survey (included 123%, not included 90%; IQR: included 109–139%, not included 72–117%) was included rather than not.

Table 1

Results of linear regression of completeness of reported household deaths (UN estimated deaths), both sexes, 2000–2021

Model 1Model 2
CovariatesCoef.95% CICoef.95% CI
Socio-Demographic Index0.260-0.333; 0.8520.375-0.172; 0.922
Super-region (Ref: Latin America and Caribbean)
 North Africa and Middle East0.125-0.429; 0.6780.185-0.382; 0.752
 South Asia-0.122-0.320; 0.076-0.110-0.305; 0.085
 Southeast/ East/ Central Asia, Oceania-0.139-0.325; 0.046-0.136-0.327; 0.054
 Sub-Saharan Africa-0.003-0.181; 0.1750.024-0.151; 0.199
Data source type (Ref.: Census)
 Survey0.114-0.023; 0.2500.2070.004; 0.410
Date of death asked (Ref.: No)
 Yes-0.211*-0.358; -0.064-0.193**-0.339; -0.047
 Unclear-0.041-0.259; 0.1760.126-0.055; 0.308
Data source type x date of death asked
 Survey x Yes - - -0.066-0.330; 0.198
 Survey x Unclear - - -0.675**-1.049; -0.300
Data included in estimated deaths analysis (Ref: No)
 Yes0.023-0.125; 0.1720.021-0.128; 0.170
 Year-0.010-0.022; 0.002-0.013*-0.024; -0.001
 Constant20.892-3.259; 45.04426.1753.167; 49.183

*p< 0.05, **p< 0.01. N= 107, countries = 61. Coef.: Coefficient. CI: Confidence interval. Confidence intervals adjusted for clustering within country.

Figure 4.

Ratio of reported household deaths to UN and GBD estimated deaths (%), by age at death and data source type, both sexes, 2000–2021.

Ratio of reported household deaths to UN and GBD estimated deaths (%), by age at death and data source type, both sexes, 2000–2021.

Figure 5.

Ratio of reported household deaths to UN estimated deaths (%), by age at death and whether date of death asked, both sexes, censuses, 2000–2021.

Ratio of reported household deaths to UN estimated deaths (%), by age at death and whether date of death asked, both sexes, censuses, 2000–2021.

The first linear regression model in Table 1 confirms that the completeness of reported household deaths to UN estimated deaths was lower where the date of death question was asked compared with not asked (-0.211 or 21.1 p.p.; predicted values holding other variables at means: no date 98.3%, date 77.2%; no date 27% higher relatively) in model 1. However, SDI, super-region, data source type, data included in estimated deaths analysis and year were all had confidence intervals that overlapped with zero. When an interaction term for data source type by whether date of death asked was included in model 2, survey predicts a higher completeness than census where date of death was not asked (0.207 or 20.7 p.p. higher; predicted: census 93.2%, survey 113.9%) although the difference was smaller where date of death was asked (0.207–0.066 = 0.141 or 14.1 p.p.; predicted values: census 73.9%, survey 88.0%). Date of death being asked also predicted a lower completeness for censuses (-0.193 or -19.3 p.p.; predicted: no date 93.2%, date 73.9%; no date 26% higher relatively) and surveys (-0.193–0.066 =-0.259, or -25.9 p.p.; predicted: no date 113.9%, date 88.0%; no date 29% higher relatively), compared with it not being asked. Supplementary information, Table S7 shows that the results from the same model using completeness calculated using GBD estimated deaths was very similar, except that inclusion of the data source in the analysis of estimated deaths increased predicted completeness (model 2 0.150 or 15.0 p.p. higher) and Southeast/ East/ Central Asia, Oceania was lower compared with Latin America (-0.262 or -26.2 p.p. in model 2).

When sex-specific completeness was analysed with linear regression and a sex variable was added to the model, the results remained mostly very similar (Supplementary information, Table S8). The sex variable had 95% confidence intervals that overlapped with zero. The primary changes were that year was negative (model 2: a decline in completeness of 1.9 p.p. for every year over the period) and Southeast/ East/ Central Asia, Oceania was predicted lower completeness compared with Latin America in model 1 (-0.219 or 21.9 p.p. lower). Results for GBD estimated deaths were mostly similar, except that the coefficient for whether date of death asked was similar but overlapped with zero (model 2 only; interacted with census) (Supplementary information, Table S9). Also, where the data source was included in the estimated deaths analysis increased completeness.

The ratio of reported household deaths to UN estimated deaths showed a similar age pattern for censuses and surveys, although with slightly more pronounced differences by age for surveys (Fig. 4). The ratio increased from 0–4 years to peak at ages 5–9 years (100% for censuses, over 120% for surveys) before declining steadily to older ages, reaching 60% at ages 75–79 years for censuses and 64% for surveys. The IQR at ages less than 40 years was approximately 60–120% for censuses and 80–140% for surveys. There was a final increase to age 80+ years by over 20 p.p. to be over 80% for censuses and by 15 pp. to be just under 80% for surveys. The ratio of reported household deaths to GBD estimated deaths was much higher at younger ages, reaching over 140% for censuses and over 180% for surveys, before a sharper decline with age to a similar level at the oldest ages to what was found for completeness based on UN estimated deaths. The age pattern of the ratio of reported household deaths to UN estimated deaths for censuses was similar for males and females at younger ages, but with the ratio for males being higher from ages 40 years onwards and reaching 95% for 80 years and above compared with 81% for females (Supplementary information, Fig. S6). For surveys, the age pattern of the ratio for males was similar to censuses, except that there was not as large an increase at the oldest ages, however the ratio for females was well in excess of 100% at younger ages and has a sharper decline at older ages thereafter (Supplementary information, Fig. S7).

The age pattern of the ratio of report household deaths to UN deaths was much different between whether a census asked the date of death or not (Fig. 5). Where the date of death was not asked, there was a pronounced age pattern of the ratio for both sexes which reaches over 120% at 5–9 years and then as low as 60% at older ages before increasing to almost 100% at 80+ years. Where the date of death was asked, however, the ratio varies between 60% and 80% with no clear age pattern. Similar differences were found for surveys, with the ratio where date of death was not asked being well over 100% at all ages less than 40 years before declining to less than 80% at 60 years and above (Supplementary information, Fig. S8). There was again no clear age pattern for surveys where date of death was asked, with the ratio varying around 80% over all ages.

4.Discussion

This systematic global assessment has revealed large variations in the completeness of household death reporting in both censuses and surveys. Median completeness of household death reporting for censuses (compared with UN estimated deaths) was 89%, being below 66% for one-quarter of censuses and above 102% in another one-quarter, and ranging from just 21% in in Burundi in 2008 to over double (202%) in Sudan in 2008. There was similarly large variation in completeness for surveys, with a slightly higher median of 96% and one-quarter having completeness above 124%. Completeness was similar for males and females and for whether UN or GBD estimated deaths were used as the denominator in calculation of completeness. The ratio of reported to estimated deaths was higher at ages less than 40 years, with over one-quarter of censuses being above 120% and one-quarter of surveys over 140%, before declining to older ages where it reaches a median of just 60% in age groups 65–79 years for censuses, and then rising to ages 80 years and above. This wide variation in results suggests that household deaths questions as currently implemented in censuses and surveys around the world are providing unreliable mortality data.

A significant issue with the implementation of household death questions is that only about one-third of censuses and surveys include a question for the date of death. This study has found that the exclusion of the date of death question from a census leads to completeness being 27% higher than if it were included, after controlling for other factors in the regression. That is, it appears that there is substantial “telescoping” or inclusion of household deaths that occurred outside the specified reference period if the date of death question has not been asked in a census or survey. Another finding is that there is less variation by age in the ratio of reported to estimated deaths if the date of death was asked; that is, no clear age pattern compared with much higher ratios at younger ages if the question was excluded. This could mean that the “telescoped” deaths are more likely to be younger – possibly because child deaths are more readily recollected by respondents, especially due to separate child mortality history questions being included elsewhere in the questionnaire – as well as inclusion of the date of death question being reflective of improved quality of the data collection including training of enumerators; this is also relevant to the lower variation in completeness at all ages if date of death was asked. This is a noticeably different age pattern to the registration of deaths, which is commonly lower for children because there are not as many incentives to register child compared with adult deaths (e.g. for inheritance purposes) [33]. The higher ratio at 80 years and above may indicate that there is overstatement of age at older ages.

Where the date of death was included in the questionnaire, we can make a better assessment of completeness of household death reporting because deaths can be excluded is they were reported to occur outside of the reference period. Overall, household death reporting in censuses and to a lesser extent surveys is incomplete. The median completeness for censuses with the date of deaths questions was 73%, with almost one-quarter having completeness below 50% and only one-quarter having completeness above 90%, while median completeness was surveys was higher at 85% but again only one-quarter had completeness above 90%. The higher completeness for surveys may reflect that better training of the fewer enumerators needed to conduct a survey compared with a census. Incomplete mortality data can be adjusted based on the level of completeness as estimated using existing methods, however death distribution methods are subject to considerable uncertainty. Further, the more incomplete mortality data are, the more unreliable the adjusted mortality statistics will be.

These findings do raise questions as to the utility of continued use of household deaths questions in censuses and surveys. It is noticeable that only one-quarter of the censuses and surveys in our study were included as an adult mortality data source in the UN World Population Prospects and only a slightly higher proportion by the GBD. A concerning finding in our study was that some of the regression models showed a decline in completeness of household death reporting over time. If household deaths questions are to be continued to be used in censuses and surveys, then further efforts need to be made to improve the quality of data collected. As mentioned, an obvious improvement can be to include the date of death question, especially in regard to inclusion of only deaths within the reference period. Clearly this is not included in many censuses because of a desire to keep the questionnaire relatively short for such a large-scale data collection, however there is the potential for them to be included in a long form questionnaire conducted in a sample of the population. For censuses, there is also a challenge of provision of adequate training on household death questions of the very large number of enumerators needed to undertake such a large data collection. For surveys however, which are typically already long questionnaires and for which there are fewer enumerators that need to be trained, there are less reasons for exclusion of these questions.

Surveys also provide a good opportunity to employ more innovative data collection techniques to improve reporting of mortality. One survey in Senegal used recall cues and other methods to assist recollection of deaths of siblings, as well as using an event history calendar to improve the accuracy of reporting of dates [34]. Mobile phone surveys have also become more widely used in the collection of mortality data with encouraging results, especially for reducing omissions of key data like age and date of death [35, 36]. Electronic data collection can also assist in improving the quality of household death data collected in censuses [5]. Improved training of enumerators can also overcome many of the data quality issues identified in this assessment. In Vietnam, the General Statistics Office conducted focus group discussions with data collectors to understand issues with collecting household deaths data in their annual survey [15]. They used the results to develop training modules for these enumerators to highlight existing data quality problems, desensitise their perceptions about asking death questions, and to strengthen interviewing techniques and response recording [15]. This intervention resulted in a 20% increase in household deaths recorded in the next survey in 2007 [15].

There are some limitations with this assessment. It does not include all censuses nor all surveys where the household deaths questions were known to be asked because household deaths results were not published nor made data available for analysis. However, our finding that completeness does not vary by SDI and, to a lesser extent, super-region of the census or survey gives us confidence that the findings would be generalisable to other countries where we could not assess data. Another issue is the accuracy of UN and GBD estimated deaths, especially by age, for these countries that have no reliably single source of mortality data. The consistency of completeness between whether UN or GBD estimated deaths was used in the denominator provides some reassurance as to the veracity of the findings; our primary analysis was of the distribution of completeness over all censuses and surveys rather than of individual countries. One difference was that the ratio of reported to estimated deaths was higher at younger ages for when GBD rather than UN estimated deaths was used, however given our primary focus was on the results produced from UN estimated deaths we emphasised the findings from the latter.

A potential issue is circularity in the estimation of completeness, which we addressed by examining results according to whether the data source was included in the estimated deaths analysis. Another method to estimate completeness is the empirical completeness method, however this is not recommended to use for countries with high HIV mortality, which includes several of the countries in sub-Saharan Africa in this study, and it cannot measure completeness above 100% because it is based on the logit transformation of the completeness fraction [19]. Finally, while we could assess the impact of inclusion of the date of death question on completeness, we could not examine other issues such as whether deaths are excluded or double-counted because of the use of the household to count death (e.g. single person households excluded, migrants double-counted).

This assessment has found considerable limitations of using household death questions in censuses and surveys to measure mortality. Inclusion of household deaths questions is most useful where the date of death is asked, otherwise a high proportion of deaths from outside the reference period will be included. Further, there should be significant training of enumerators and use of electronic data collection. Household death reporting is more challenging in censuses because of the scale of data collection, however if household deaths questions are to be included there is scope for them to be included in a questionnaire implemented in a sample of households. Censuses however do have considerable advantages in being able to estimate detailed socio-economic and spatial inequalities in mortality. For surveys, there are more opportunities to implement innovative methods to improve the accuracy of recall of deaths, which is essential if they are to accurately collect mortality data from the COVID-19 pandemic. A limitation of surveys is sampling uncertainty, however increasing the recall period for deaths could adversely affect data quality. Continued implementation of household deaths questions in censuses and surveys, without improvements to how they have been conducted in most countries in the past 20 years, will continue to produce unreliable mortality statistics that are of limited use in filling the gap in mortality evidence caused by suboptimal death registration and hence will not be an effective investment by governments and donors.

Funding

The authors did not receive support from any organization for the submitted work.

Ethical approval

The study was conducted of publicly available aggregated data, so no ethics approval was required.

Consent

Not applicable, because we did not collect data directly from individuals.

Data and code availability

A detailed dataset of the comparison of reported household deaths to UN WPP and GBD estimated deaths is available at https://doi.org/10.26188/22191496, a summary dataset for use in analysis is available at https://doi.org/10.26188/22191499 and code for replicating the results is available at https://doi.org/10.26188 /22191505.

Author contributions

TA conceived the study, collated the data used in the study, designed and conducted the data analysis, and wrote the drafts and final version of the manuscript. HL collated the data used in the study, conducted the data analysis and reviewed the drafts of the manuscript. SPP collated the data used in the study and reviewed the drafts of the manuscript. All authors approved the final manuscript.

Supplementary data

The supplementary files are available to download from http://dx.doi.org/10.3233/SJI-240041.

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

References

[1] 

AbouZahr C, de Savigny D, Mikkelsen L, Setel PW, Lozano R, Lopez AD. Towards universal civil registration and vital statistics systems: the time is now. Lancet. (2015) ; 386: (10001): 1407-18.

[2] 

World Health Organization. Methods for estimating the excess mortality associated with the COVID-19 pandemic. Geneva; (2022) .

[3] 

World Bank, World Health Organization. Global civil registration and vital statistics scaling up investment plan 2015–2024. Washington, DC: World Bank; (2014) .

[4] 

Hill K, Johnson P, Singh K, Amuzu-Pharin A, Kharki Y. Using census data to measure maternal mortality: A review of recent experience. Demogr Res. (2018) ; 39: : 337-63.

[5] 

Technical Advisory Group on COVID-19 Mortality Assessment Working Group 2. The Potential of Surveys and Censuses to Fill Adult Mortality Data Gaps in the Context of COVID-19: a Stocktaking Paper. New York: United Nations Statistical Commission; (2022) .

[6] 

United Nations Statistics Division. Principles and Recommendations for a Population and Housing Censuses: Revision 2. New York: United Nations; (2008) .

[7] 

United Nations Statistics Division. Principles and Recommendations for a Population and Housing Censuses: Revision 3. New York: United Nations; (2017) .

[8] 

Lankoande YB, Masquelier B, Zabre P, Bangre H, Duthe G, Soura AB, et al. Estimating mortality from census data: A record-linkage study of the Nouna Health and Demographic Surveillance System in Burkina Faso. Demogr Res. (2022) ; 46.

[9] 

United Nations Statistics Division. Principles and Recommendations for a Vital Statistics System: Revision 3. New York: United Nations; (2014) .

[10] 

Wak G, Bangha M, Azongo D, Oduro A, Kwankye S. Data Reliability: Comparison between Census and Health and Demographic Surveillance System (HDSS) Outputs for Kassena-Nankana East and West Districts, Ghana. Population Review. (2017) ; 56: (1): 31-45.

[11] 

Randall S, Coast E, Antoine P, Compaore N, Dial FB, Fanghanel A, et al. UN Census “Households” and Local Interpretations in Africa Since Independence. Sage Open. (2015) ; 5: (2).

[12] 

Working Group for Mortality Estimation in E. Wanted: studies on mortality estimation methods for humanitarian emergencies, suggestions for future research. Emerg Themes Epidemiol. (2007) ; 4: : 9.

[13] 

Masquilier B, Ndiaye CT, Pison G, Dieme NB, Diouf I, Helleringer S, et al. Evaluation des estimations indirectes de mortalité dans trois observatoires de population au Sénégal. African Population Studies. (2016) ; 30: (1): 2227-41.

[14] 

Pandey SP, Adair T. Assessment of the national and subnational completeness of death registration in Nepal. Bmc Public Health. (2022) ; 22: (1): 429.

[15] 

Ngo AD, Rao C, Hoa NP, Adair T, Chuc NT. Mortality patterns in Vietnam, 2006: Findings from a national verbal autopsy survey. BMC Res Notes. (2010) ; 3: : 78.

[16] 

G.B.D. Demographics Collaborators. Global age-sex-specific fertility, mortality, healthy life expectancy (HALE), and population estimates in 204 countries and territories, 1950–2019: a comprehensive demographic analysis for the Global Burden of Disease Study 2019. Lancet. (2020) ; 396: (10258): 1160-203.

[17] 

United Nations Population Division. World Population Prospects: The 2019 Revision. New York: United Nations; (2019) .

[18] 

Department of Population. The 2014 Myanmar Population and Housing Census, The Union Report, Census Report Volume 2. Nay Pyi Taw: Department of Population, Ministry of Immigration and Population; (2015) .

[19] 

Adair T, Lopez AD. Estimating the completeness of death registration: An empirical method. PLoS One. (2018) ; 13: (5): e0197047.

[20] 

Bennett NG, Horiuchi S. Mortality estimation from registered deaths in less developed countries. Demography. (1984) ; 21: (2): 217-33.

[21] 

Hill K. Estimating census and death registration completeness. Asian Pac Popul Forum. (1987) ; 1: (3): 8-13, 23-4.

[22] 

Murray CJ, Rajaratnam JK, Marcus J, Laakso T, Lopez AD. What can we conclude from death registration? Improved methods for evaluating completeness. PLoS Med. (2010) ; 7: (4): e1000262.

[23] 

Jha P, Brown PE, Ansumana R. Counting the global COVID-19 dead. Lancet. (2022) a; 399: (10339): 1937-8.

[24] 

Jha P, Deshmukh Y, Tumbe C, Suraweera W, Bhowmick A, Sharma S, et al. COVID mortality in India: National survey data and health facility deaths. Science. (2022) b; 375: (6581): 667-71.

[25] 

Global Burden of Disease Study. Global Burden of Disease Deaths Database. Global Burden of Disease. Seattle (2020) .

[26] 

ICF. The Demographic and Health Surveys (DHS) Program Rockville, MD2020 [Available from: https://dhsprogram.com/].

[27] 

IPUMS International. IPUMS International. In: Minnesota Population Centre UoM, editor. Minneapolis (2022) .

[28] 

United Nations Statistics Division. UNSD Demographic Statistics New York: United Nations; 2022 [Available from: http://data.un.org/.

[29] 

Global Burden of Disease Study 2019 (GBD 2019) Results [Internet]. 2020a [cited 3 February 2021]. Available from: http://ghdx.healthdata.org/gbd-results-tool.

[30] 

Wilcoxon F. Individual Comparisons by Ranking Methods. Biometrics Bull. (1945) ; 1: (6): 80-3.

[31] 

Global Burden of Disease Collaborative Network. Global Burden of Disease Study 2019 (GBD 2019) Socio-Demographic Index (SDI) 1950–2019. In: Insitute of Health Metrics and Evaluation (IHME), editor. Global Burden of Disease. Seattle (2020) b.

[32] 

StataCorp LP. Stata/SE 16.0. College Station TX,: StataCorp LP; (2019) .

[33] 

Adair T, Lopez AD. Generating age-specific mortality statistics from incomplete death registration data: two applications of the empirical completeness method. Popul Health Metr. (2021) ; 19: (1): 29.

[34] 

Helleringer S, Pison G, Masquelier B, Kante AM, Douillot L, Duthe G, et al. Improving the quality of adult mortality data collected in demographic surveys: validation study of a new siblings’ survival questionnaire in Niakhar, Senegal. PLoS Med. (2014) ; 11: (5): e1001652.

[35] 

Chasukwa M, Choko AT, Muthema F, Nkhalamba MM, Saikolo J, Tlhajoane M, et al. Collecting mortality data via mobile phone surveys: A non-inferiority randomized trial in Malawi. PLOS Global Public Health. (2022) ; 2: (8): e0000852.

[36] 

Kuehne A, Lynch E, Marshall E, Tiffany A, Alley I, Bawo L, et al. Mortality, Morbidity and Health-Seeking Behaviour during the Ebola Epidemic 2014–2015 in Monrovia Results from a Mobile Phone Survey. PLoS Negl Trop Dis. (2016) ; 10: (8): e0004899.