You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.
Go to headerGo to navigationGo to searchGo to contentsGo to footer
In content section. Select this link to jump to navigation

Positioning household surveys for the next decade


Household surveys are a vital component of national statistical systems. They are the basis for official statistics on social and economic phenomena and are key to tracking progress towards the Sustainable Development Goals (SDGs). However, despite their importance, household surveys face various challenges, including problems with data quality, timeliness, and policy relevance, among others. Prepared by the United Nations Inter-Secretariat Working Group on Household Surveys (ISWGHS), this paper identifies eight technical priority areas for innovations in household survey design, implementation, and analysis. With these in mind, the paper also presents a set of recommendations for fostering enabling environments at the national and international levels to support the production of more and higher-quality household survey data that are affordable and responsive to policy needs. The paper aims to inform both the considerations of national statistical offices as they weigh priorities and pursue innovations to transform their household survey systems, as well as the work of ISWGHS in executing its mandate to support countries in achieving the SDGs.


As a key source of social and economic statistics, household surveys are a vital component of national statistical systems. Not only do they provide data that inform the design and evaluation of development policies, they are also a unique source of attitudinal and behavioral insights difficult to obtain elsewhere. Household surveys are critical for tracking progress towards national and international development goals, providing the requisite information for more than a third of all 232 indicators for the Sustainable Development Goals (SDGs), cutting across 13 out of 17 SDGs [1]. They can be used to improve and complement administrative data sources, as well as to validate and calibrate remote-sensing models and machine learning applications that combine household surveys with non-traditional data sources, providing insights with accuracy and precision that cannot otherwise be achieved by using these data sources alone. Today, the need for household surveys is greater than ever, given the widespread socioeconomic and health impacts of the COVID-19 pandemic that have resulted in an increase in global poverty for the first time in two decades [2]. Survey data are key to understanding the distributional impacts on households and individuals of global shocks and crises such as COVID-19, as well as climate change, natural disasters, and extreme weather events.

Despite the substantial progress that has been achieved in the availability and quality of household surveys over the past decade, weaknesses persist in their availability, coverage, accuracy, timeliness, affordability, policy relevance, and usability, particularly in the low-income countries that stand to benefit most from better survey data. Urbanization and higher income levels tend to reduce survey response rates, lengthy questionnaires bring about respondent fatigue with negative consequences for data quality, and coordination failures are common within overburdened statistical systems. During the initial phase of the COVID-19 pandemic, almost all countries had either fully or partially stopped face-to-face surveys as of May 2020, and of more than 180 countries that implemented phone surveys to measure COVID-19 impacts, only 43 percent were able to use an updated sampling frame from a recent household survey [3, 4]. At the same time, the data landscape has transformed in recent years, with an emergence of new data sources providing more granular and timely data (including geospatial data, mobile phone data, sensor data, among others), new technologies for data collection, new approaches for engaging with respondents and data users, and new business architecture models for running national statistical systems.

With this context in mind, this paper presents eight technical priority areas for household surveys to overcome existing challenges, adapt to the changing data ecosystem, meet the ever-increasing demand for data, and increase development policy and research impact in the remaining years under the 2030 Agenda for Sustainable Development. While the financial, technological, and human resources required to adopt recommendations in each priority area vary across countries, the priority areas are intended to guide countries in weighing priorities as they pursue innovations for improving and transforming their household survey systems. They are also intended to guide the Inter-Secretariat Working Group on Household Surveys (ISWGHS) in executing the tasks requested by the United Nations Statistical Commission, the highest body of the global statistical system bringing together the Chief Statisticians from member states from around the world, to provide guidance and support countries in producing the necessary data to fully implement the 2030 agenda and achieve the SDGs. To this end, the paper presents the key elements of enabling environments at the national and international levels that can best support household survey systems to produce more and higher-quality survey data that are affordable and responsive to policy needs. The primary audience for the paper are the national statistical offices (NSOs) and institutions/organizations that provide technical and financial support to NSOs and/or that benefit directly by the data products produced by NSOs. Since the paper highlights innovations and future directions for research and development in survey methodology, survey practitioners beyond NSOs may find the discussion relevant for their work.

2.Technical priority areas for household surveys for the next decade

This section outlines eight technical priority areas for household surveys in the next decade, namely (1) enhancing the interoperability and integration of household surveys; (2) designing and implementing more inclusive, respondent-centric surveys; (3) improving sampling efficiency and coverage; (4) scaling up the use of objective measurement technologies, (5) building capacity for CAPI, phone, web, and mixed-mode surveys; (6) systematizing the collection, storage, and use of paradata and metadata; (7) incorporating machine learning and artificial intelligence for data quality control and analysis; and (8) improving data access, discoverability, and dissemination. For each priority area, a short summary of recent developments and advances is provided, based on a review of academic literature and country experiences. Suggestions are offered for next steps, ranging from improving basic survey data infrastructure for phone, web, and mixed-mode surveys, to conducting experiments to develop and validate improved and scalable survey methods.

The priority areas were chosen based on three primary criteria: (1) areas that have been proven to be successful or have a great potential to make a medium-term impact; (2) areas that both build a strong data foundation and expand the frontier for research and development; and (3) areas that are more likely to benefit low- and middle-income countries as the key users of the document. However, these priorities are neither comprehensive nor “one-size-fits-all”, and thus countries are encouraged to set their own priorities based on the capacities and needs of their national statistical systems. While many of the examples provided in the paper have been successful in some countries, further piloting and experiments under national circumstances may be necessary.

2.1Enhancing the interoperability and integration of household surveys

Enhancing the interoperability and integration of household surveys can increase the cost-effectiveness and relevance of survey data production, while increasing accuracy and granularity (in terms of both spatial and temporal resolution) to a level only possible through data integration. Interoperability refers to the ability to link different data sources through common identifiers for individuals, households, facilities, firms or administrative areas; geospatial coordinates; time stamps; and common classification standards. Enhancing data interoperability and integration is a priority area for improving household survey data, both by facilitating linkages to other household surveys as well as to censuses, geospatial data, administrative records, and different types of non-traditional sources, such as earth observation, call detail records and social media platforms. Prioritizing data integration and interoperability can also mitigate common criticisms of household surveys in terms of respondent burden, coverage, and timeliness, and enable their downstream reuse for ground-truthing and calibration purposes, inter alia.

An abundance of research has been produced on methods for integrating household surveys with other data sources. A review paper by Lohr and Raghunathan [5] highlights a number of broad objectives that can be achieved through this kind of integration, including: (1) improving the efficiency of sampling by using multi-source frames or auxiliary information from other data sources (e.g. stratification), (2) bridging household survey data gaps through direct record linkages [6, 7] or with model-based imputation [8], and (3) improving the precision, timeliness, and granularity of survey-based estimates by integrating surveys with censuses, other surveys, and other data sources, as discussed below.

In practice, data integration by design should always start with a thorough understanding of data needs, through consultations with key stakeholders as described in 3.1.1. The consultations should be conducted in parallel with assessments of the availability, accessibility, quality, and interoperability of data sources to be integrated with surveys, including gathering information about the associated metadata. Reconciliation of the concepts, definitions, and inferences produced across different data sources is also a critical component of the design process [10].

While there are frameworks that describe in detail common issues and errors in data integration efforts and the paper is not meant to provide a comprehensive treatment of challenges encountered in data integration, the following subsections highlight key considerations for creating successful data integration programs in countries. Moving towards an integration by design approach for household surveys also aligns with the ongoing transformation of national statistical offices (NSOs) to incorporate new data sources (more details in 3.1.2) and continuing discussions about the role of household surveys in the larger data ecosystem. The Trusted Smart Statistics model [9], currently under development in the European Statistics System, calls for a multi-source paradigm system where each type of data source serves multiple purposes and each statistical domain benefits from different types of data sources.

2.1.1Improving accessibility of other data sources for integration

Accessibility of data outside of NSOs for integration has been challenging for many. One of the biggest challenges faced by countries experimenting with the small area estimation of development outcomes such as consumption-based poverty is the lack of administrative data sources that can be considered for integration with household surveys [11]. NSOs in low- and middle-income countries often rely on census-survey integration in small area estimation efforts, as neither administrative data nor non-traditional data are available to them. Strong legislative backing would be helpful to allow NSOs access to other data sources for official statistical purposes, while also addressing privacy risks that are heightened due to data integration [12]. In countries where administrative or nontraditional data sources can be acquired, a cost-sharing schema among multiple surveys within the national statistical office could help reduce the cost burden for individual surveys. Another option for consideration is shifting from sharing data to sharing computation, given the sensitive nature of personal “nano-data” [13]. This shift could potentially limit the level of risk in data privacy breaches while also reducing burden on IT infrastructure.

2.1.2Fostering data interoperability by design

Integration across different data sources demands data interoperability by design. Interoperability is a key requirement for data to be valuable for development and relates to the ease with which data sources can be linked and integrated through geospatial coordinates, common questions, time stamps, common classification standards, or common identifiers for persons, facilities, or firms, among others [14]. Given governments’ budget constraints and the need for more granular data, many countries are pursuing greater integration across different data sources – between censuses and surveys, exclusively among surveys, and between surveys and other non-census/survey data sources.

Examples of survey data integration include: (1) small area estimation methods, which typically integrate survey data with censuses and administrative data source and have been used for measuring consumption-based poverty, mortality, labor force participation, disability, and other areas of the SDGs [15, 16, 17]; (2) survey-to-survey imputation to improve the availability, quality, timeliness, and cost-effectiveness of official statistics, which have been used routinely in applications of proxy-means testing and evaluation of subnational project impacts, typically focused on poverty measurement [18, 19, 20]; and (3) linking individual records of household surveys and administrative records to reduce response burden [6].

There have also been known examples of integration of household surveys with satellite imagery and processed geospatial data for high spatial and temporal resolution estimates of agricultural outcomes, such as crop-specific measures of area under cultivation, production, and yield, which emphasize the collection of precise GPS-based plot outlines [21] and a range of complementary data collected on the ground through objective measurement approaches, such as crop cutting, implemented at scale (i.e., across the entire survey sample) or on a sub-sample basis [22, 23].

Successful data integration by design depends on a number of important elements. Concepts and definitions should be harmonized between the household survey and the other source for integration [24], which could be a population census, a different survey, administrative data, satellite imagery, processed geospatial data, or citizen-generated data [25]. This harmonization should also extend to the list of administrative units used in the data sources. Moreover, common auxiliary variables that can explain a large share of the variations of the outcome indicator should be included in both data sources to improve the efficiency of data integration.

Similarly, if feasible, administrative identifiers for individuals, firms, and facilities should be elicited as part of household surveys to enable unit-record linkages of household surveys with administrative data sources. This can serve multiple purposes, including (1) assessing the quality of survey data by leveraging identical information contained in administrative data, (2) providing a basis for calibration or weighting, or (3) populating certain survey variables (for example, linkages to taxation databases obviate the need to ask extensive questions on income in countries with well-developed and comprehensive administrative sources).

Relatedly, georeferencing should be adopted in household surveys for facilitating the validation and calibration of machine learning models that combine georeferenced survey data with publicly available satellite imagery and processed geospatial data to derive precise estimates of poverty, asset wealth, and agricultural outcomes at high spatial resolution [26, 27, 28].

Privacy risks need to be considered for data integration for two main reasons. First, linking datasets could increase the risk of disclosure. Proper measures should be taken to protect the confidentiality of individuals when disseminating microdata. Second, requesting administrative identifiers may discourage respondents from participating in the survey – this potential risk must be taken into account.

Sampling design for the household survey should also facilitate data integration. When survey-to-survey imputation is being carried out, the samples for the two surveys should strive to have similar designs [29]. Well-designed and high-quality surveys have been successfully used to correct biases from non-probabilistic survey data in several countries [30, 31]. Relevant paradata should be collected and curated for calibration to correct the selection biases and measurement errors generated from citizen-generated data and other types of non-probabilistic data sources [32, 33].

From an institutional standpoint, moving from domain-oriented to process-oriented household survey operations should be considered if the national statistical office is in the process of modernization. Under the domain-oriented approach, an independent team is in charge of the entire process of survey operations for a specific survey (e.g., labor force surveys, living standards surveys, health surveys, etc.), from planning and collection to processing and dissemination. On the other hand, the process-oriented approach establishes units that are in charge of different steps of survey operations, regardless of the type of surveys. For example, one unit would be in charge of methodology and sampling, another unit would be in charge of data collection, and so on [34]. The advantage of this approach is that different sub-processes of household survey operations are coordinated and standardized. For example, the questionnaire design unit ensures that the same definitions and classifications are used for the same variable across surveys. Such an approach improves coordination and standardization across surveys, while also making the surveys more interoperable and efficient. However, for countries running limited numbers of surveys, the resulting gains in efficiency are unclear. More information about this approach is covered in 3.1.2.

2.1.3Establishing a total quality framework for data integration

Although the “total survey error” (TSE) framework identifies each source of error for household surveys [35], other data sources come with their own quality issues, including potential biases and measurement errors for which a similar total error framework is usually absent, except perhaps in the case of administrative data used for statistical purposes [36]. More work is needed to quantify the errors associated with non-traditional data sources such as earth observation data and citizen-generated data, as well as the errors produced during the integration process [10, 37, 38]. For planning any types of data integration, the quality of all input data (surveys and other data sources) should be assessed, in terms of coverage (e.g., sufficient representation of key population groups), timeliness (e.g., regularity of data availability), and measurement errors (e.g., accuracy of data).

Data integration methods, including record linkage and model-based estimates, also involve errors including data linkage error or, for model-based estimates, errors due to the violation of assumptions or poor input data. For model-based estimates, validating model assumptions is an unavoidable step that requires additional time and resources. Estimating the mean square error of estimators is another important aspect that should be further researched and developed. A further aspect of concern is the extent to which the target variable matches the phenomenon captured by the administrative data source – this should be adequately assessed and transparently reported as a key element of data quality. Therefore, a data quality framework for integrating surveys with other data sources needs to be systematically developed.

2.1.4Maintaining high ethical standards and data confidentiality

Data integration increases the risk of data breaches and misuse. Conversely, limited access of certain data sources can hamper data integration. In the first instance, there can be important barriers to access by NSOs, often related to data protection concerns, and strong legal frameworks and institutional arrangements are required to enable access under appropriate conditions. Furthermore, in generating public use datasets, personal identifiers (that allow linkages with administrative records) or precise GPS coordinates (that enable integration with satellite imagery and other georeferenced and processed geospatial data) are considered confidential and are excluded (in the case of the former) or anonymized (in the case of the latter).

While international standards and analytical tools are available for the deidentification of household survey data [39], the risk of disclosure is increasing with enhancements in data interoperability, requiring continued improvement of deidentification techniques and strengthening of NSO capacity to successfully adopt these standards and analytical tools [40]. Recommendations on providing increased access to microdata are provided in Section 2.8.

Another important aspect to consider is related to ethics and respondents’ right to be informed about and are fully aware of the use of their own data, including data integration. For example, when Statistics Canada replaced income survey questions with data from tax records, respondents were informed about this practice during the interview. Addressing respondents’ concerns over the use and protection of their data and the potential benefit of collecting those data is an important issue for consideration [41]. In Zambia, for instance, an ethical clearance is required for data collection, which involves a signed or thumb printed consent form by respondents to demonstrate that they understand the purpose of the survey, how the data will be used, and their rights during the interview.

2.2Designing and implementing more inclusive, respondent-centric surveys

A major challenge for face-to-face household surveys today is declining unit and item response rates, correlated with increasing urbanization and income levels [42, 43, 44, 45] and most recently with social distancing measures brought on by the COVID-19 pandemic. As is well known, nonresponse rates have been traditionally higher in high-income contexts – certainly in the context of face-to-face surveys and even more so in the case of phone surveys [46].

The reliance on proxy respondents, which has been adopted widely in large-scale survey operations that collect data on individuals, is a related area of concern and a convenient design feature that hedges against the risk of otherwise missing information that should ideally be reported by household members themselves [47, 48]. Recent research has highlighted the biases associated with proxy reporting, minimization of which would undoubtedly enable data producers to more accurately capture the livelihoods, experiences, and behaviors of individuals [49, 50, 51]. Below, various approaches are highlighted for mitigating against nonresponse, minimizing the use of proxy respondents, and improving the availability and quality of individual-disaggregated survey data, including on marginalized populations.

2.2.1Transforming respondents into collaborators and co-producers

Respondent participation in household surveys largely depends on three elements: trust and trustworthiness, willingness to participate, and accessibility of data collection [52]. Building trust requires a rethinking of the relationship between NSOs and survey respondents, shifting from viewing participants as respondents, to viewing participants as collaborators and co-producers. This runs the gamut from designing surveys that are clear and simple, to being responsive to problems, concerns, and questions, to making surveys more inclusive by bringing respondents with diverse needs and abilities into the survey research and development process [53, 54, 55].

Constraints to willingness to participate in the data collection could include respondents’ inability to relate to survey topics or questions, exhaustion from over-research (respondent fatigue), and competing pressures from other daily activities, among others. While relevant cross-country empirical evidence is not immediately available, on the whole, the extent and drivers of nonresponse are expected to vary across and within countries (both across topics and geographic areas). As such, the issue needs to be studied in each context and various solutions should be adapted and piloted accordingly. The large consultative exercise of the United Kingdom Inclusive Data Task Force [52] is one example of how to make collected data more inclusive, by soliciting input from a wide range of stakeholders. Examples within national statistical offices also include tailoring survey design to meet the needs of respondents, collaborating with behavioral economists and communication specialists, and asking direct questions about response burden to help survey implementers better understand and reduce nonresponse [57].

2.2.2Minimizing the reliance on proxy respondents to improve quality of data

High-quality, individual-disaggregated data that accurately reflect the economic and social roles and choices of men and women are critical for a variety of purposes, including: (1) the targeting and evaluation of policies to provide social protection for raising living standards and mitigating against shocks; (2) promoting access to and ownership of physical and financial assets; and (3) removing barriers to technology adoption, to name a few. Similarly, a clearer picture of the intra-household distribution of labor – across sectors, wage- or self-employment activities, and unpaid care and domestic work – can better inform the targeting of employment and training programs. Furthermore, monitoring progress towards several targets of the SDGs across poverty reduction, agriculture, gender, employment, and inequality, require individual-disaggregated data on asset ownership, labor, time use, and roles in family enterprises.

Household surveys are one of the most promising sources of individual-disaggregated data to analyze these issues and their interactions. However, their reliability and usability are mediated by questionnaire design choices and respondent selection protocols. Regarding the latter, it is common for individual-level household survey modules to allow for proxy respondents to report on behalf of adult household members – a measure that cuts costs and avoids missing information. On other topics, such as asset ownership, it has been common for household surveys to either collect information at the household-level (even when assets are owned by individuals) or to identify intra-household asset owners but elicit information from a single, “most knowledgeable” household member [48, 49].

Momentum has been increasing to improve the availability, scope and quality of individual-disaggregated survey data collected in household surveys on a range of topics including asset ownership, work and employment, time use, and violence against women. Through the formulation of international guidelines on these topics, with a focus on improved approaches to questionnaire design and respondent selection, efforts to promote their adoption in large-scale surveys, and research that has demonstrated the utility of intra-household, self-reported survey data vis-à-vis data that are collected based on sub-optimal respondent selection protocols, countries now have an expanding base of knowledge and experiences to draw from in minimizing the reliance on proxy respondents to improve the quality of data on men and women.

Moving forward, there is a need for NSOs to (1) be more systematic in tracking the reliance on proxy respondents in their survey operations, (2) be critical about their fieldwork implementation protocols vis-à-vis existing international guidance for maximizing rate of self-reporting among adults, (3) draw on documented experiences in improving their approach to interview scheduling with adult household members to minimize the reliance on proxy respondents, particularly for sensitive topics; and (4) to be supported, particularly in lower-income contexts, in the adoption of best practices. Having said that, proxy response may be unavoidable if adult household members cannot be interviewed due to advanced age, poor health and/or temporary migration away from household residence. As such, it can be allowed as a last resort as NSOs put a greater emphasis on the collection of self-reported information from adult household members. And reliance on proxy response will likely continue being the dominant approach to data collection from non-adult household members.

Finally, getting better individual-disaggregated survey data may require additional financial resources, mainly to allow additional time for interviewers to schedule and conduct private interviews. As such, the approach to costing household surveys and securing the required financial resources may also need to be revisited.

2.2.3Improving the correction of nonresponse bias

While calibration and imputation have been widely used by survey organizers to reduce nonresponse bias, two additional approaches could result in its further reduction. The first is to invest in high-quality benchmarking data sources, that is, high-quality data sources (in terms of measurement and representation errors) with auxiliary variables that have prediction power for the outcome indicators [58]. The second is to enhance the collection and use of paradata during electronic data collection, as discussed in Section 2.6. Survey Solutions, for instance, is an example of a CAPI/CATI/CAWI platform that automatically collects and allows the users to download the paradata associated with each survey. While the type of paradata available for nonresponse bias correction varies by mode of data collection, in general, three types of paradata can be used for this purpose: (1) call history data, containing information on interview attempts and outcome of each attempt, (2) interviewer observations of the sample units, and (3) measures of the interviewer-householder interaction [59]. More research and experimentations are needed to better understand how paradata, which may also come with its quality issues such as missing, incomplete and inaccurate information, can be better used to reduce nonresponse bias.

2.3Improving sampling efficiency and coverage

Continuous improvements to sampling frames and adoption of innovative sampling techniques are required to improve sampling efficiency and coverage in household surveys. A proper sampling frame covers all target populations in the country, is accurate and up-to-date, and provides adequate contact information for survey organizers to approach respondents through different survey modes as needed. This is particularly relevant for COVID-19 fieldwork protocols and the need to reduce overall travel footprint in the post-pandemic era. The importance of necessary auxiliary variables to facilitate efficient sampling should also be emphasized. Proper sampling approaches reach the target population with the required precision while also meeting budget requirements.

2.3.1Improving sampling frames for household surveys

The most common sampling frame comes from population censuses, through the area frame that contains hierarchical geographical areas from the largest area (at the national level) to the smallest geographic division, usually called enumeration areas (EAs) and a list frame that contains the list of households located within each EA. Address-based sampling frames have also started to gain ground in high-income countries given their efficiency and quality. The addresses are usually obtained from a commercial vendor and updated regularly, and important auxiliary variables are available to help improve sampling efficiency [60]. For countries that do not have the resources or capacity to maintain a comprehensive list of addresses, a master sample frame is often used. With a master sample frame, the address list is updated only for selected enumeration areas. In Brazil, for example, the master sample frame has been used by all household surveys in the country. Master sample frames allow for the cost-sharing of listings, better knowledge of selected areas, and opportunities for richer data analysis [61].

For the 2020 round of censuses, many countries plan to use telephones to follow up with respondents to address missing values and nonresponse [62]. For countries without a good telephone frame for phone surveys, phone numbers collected during censuses (or surveys, as discussed in Section 2.5) can be used for subsequent surveys. However, this requires the following of strict protocols and consent from respondents. More information about obtaining consent is available in Section 2.5.

When a survey aims to sample hard-to-reach populations, the use of multi-frame sampling can improve the cost efficiency of the overall survey and improve the inclusivity of survey data. For example, an epidemiology survey could use an area frame for a general population health survey alongside another list frame of clinics specializing in a certain disease. This method allows for capturing data from a higher number of populations with this specific disease, while reducing the cost of screening to be carried out in the area frame [63]. Multi-frame sampling can also be helpful in post-disaster settings when many respondents are displaced. For example, sampling school districts was found to be an efficient method for reaching families who relocated in the aftermath of Hurricane Katrina [64]. A new project has recently started in the United States that will integrate the current master address frame with the business register, job lists, and the demographic frame. This integration will help reduce response burden, improve collaboration across different agencies, and improve coverage, especially for vulnerable and difficult-to-reach populations [65]. It is important to note that use of multiple frames, and the associated multiplicity weighting are complicated and more challenging for survey organizations to carry out. Guidance and training would be required for the multi-frame sampling to be fully adopted by countries.

Moving forward, efforts should be made to ensure census records are geospatially enabled, i.e., geocoded to a specific location [66] – this can facilitate selecting samples for household surveys as well as data integration (as discussed in 2.1). In a 2019 survey carried out by the UN Statistics Division of 158 countries, 86 percent of NSOs indicated that as part of the 2020 round of censuses, they either have collected or will collect GPS coordinates for enumeration areas, while 70 percent indicated that they will collect GPS coordinates for buildings and housing units [62].

For countries that lack census records for EA selection or have an outdated census frame, high-resolution satellite data can be used to generate estimated population densities and demarcate EA boundaries [67]. For example, the last population census in Somalia was carried out in 1975 with a population count of 3.9 million. Given significant increases in population size (up to an estimated 15 million in 2019) and high levels of displacement within the country, a gridded population approach was developed to create a frame for the first selection stage of the 2017 Somalia Rapid Emergency Response Survey. Geographical areas of Somalia were divided into 100 by 100-meter grid cells and neighboring cells were combined to form primary sampling units (PSUs) [68, 69, 70]. A sample of PSUs was selected using probability proportional to estimated size, with the population figure obtained from WorldPop. Relatedly, geospatial data can also help build equal size EAs to improve field work management and sampling efficiency [72].

2.3.2Adopting innovative sampling methods for difficult-to-sample population groups

Under the SDG pledge to leave no one behind, there is an expectation that surveys should cover various vulnerable population groups, of which many are difficult to survey due to various challenges. These include difficulties in identifying certain populations due to stigma and sensitivity, in locating and accessing certain population groups that are small in percentage or dispersed, and in persuading populations to be interviewed. The need for self-reported (as opposed to proxy-response) data at the individual level, crucial for study from a gender perspective, is also a challenge in household surveys.

There are many ways to improve the measurement precision of certain population groups. For rare populations such as ethnic minorities, over-sampling areas that have more concentrated minority groups can improve sample coverage and reduce costs for expensive screenings. Another option is network sampling, which uses an expanded sample screening process so that information is also obtained from others outside of the household, such as neighbors, relatives, and other connected individuals. This can maximize coverage when collecting data on rare events or sensitive topics. The method has been used to produce estimates on adult mortality and on population groups that are marginalized and face social stigma, such as those using drugs [74]. Another interesting example is using machine learning models to predict ethnicity and assist with more targeted sampling, as covered in 2.7.

For countries that are interested in covering particular population groups, various methods to improve sample coverage should be attempted, in connection with an updated and comprehensive sampling frame. Guidance on sampling vulnerable population groups is currently being compiled by the ISWGHS through a Wiki site [75].

2.3.3Applying responsive and adaptive sampling design

Responsive and adaptive sampling design is an evidence-based approach for guiding real-time design decisions during survey data collection, which takes advantage of advances in electronic data collection such as the availability of geospatial information and paradata. One experiment conducted as part of the 2009 Swedish Living Conditions Survey explored various adaptive survey design strategies, such as terminating data collection as soon as the response rate reaches a certain threshold. During the process R-indicator was used to assess the balance of the set of respondents defined by key characteristics and the distance between respondents and nonrespondents. The results showed that the design reduced data collection costs through significantly fewer call attempts [76]. Another example comes from Nigeria, where a population-based HIV survey adopted a responsive survey design that used paradata to expedite the survey data collection and release [77]. The overall data collection duration was reduced by 1 week from the original plan and saved about $4.4 million in costs.

2.4Scaling up the use of objective measurement methods

Household survey data may include measurement errors driven by a range of factors, including recall bias, strategic misreporting, confirmation bias, social desirability bias, and self-esteem bias, among others [78]. To the extent that measurement errors are systematic and non-classical in nature, the findings and policy recommendations of household survey data analyses will be biased.

Methodological survey research to develop and validate improved methods for survey data collection has surged over the last decade, particularly in low- and middle-income contexts. The expansion in research has been motivated not only by long-standing concerns around measurement errors in self-reported survey data but also by the increasing availability of scalable technologies and methods that allow for addressing these measurement errors through direct measurement.

Research has demonstrated the extent and econometric implications of non-classical measurement errors in self-reported survey data on a range of topics, while also documenting the accuracy, feasibility, and cost implications of adopting direct measurement tools, such as GPS technology for plot area measurement and outline capture [79, 80, 81, 82], crop cutting for crop yield estimation [83, 84, 85, 86], high-frequency phone survey data collection for measuring household agricultural labor inputs [87, 88], DNA fingerprinting for crop variety identification [89, 90], physical activity trackers (i.e. accelerometers) for informing the measurement and analysis of labor productivity, effort, and poverty [91, 92, 93, 94], smartphone applications for time use measurement, recording social interactions between respondents and interviewers, or real-time travel patterns [95, 96, 97, 98, 99], low-cost testing kits for the rapid measurement of water quality [100], and “web scraping” for automating the collection of prices for selected internet retailers, as opposed to relying exclusively on survey operations for the Consumer Price Index (CPI) [101].

On the whole, direct measurement has been documented to increase the accuracy and scope of survey data collection while also reducing respondent burden, depending on the application. Before scaling up the use of objective measures in household surveys, experiments need to be carried out to enable the investigation of different types of bias, measurement errors, and privacy concerns that may be inherent in direct measurement tools (more discussion on experimental statistics is available in 3.1.6.) [102, 103]. It should also be noted that direct measurement, as presented in this section, will not apply to all topics that are covered in household surveys.

Finally, direct measurement may increase data collection costs in terms of procuring handheld GPS devices, accelerometers, smartphones, or testing kits, or in terms of additional time spent in the field by interviewers. However, the marginal cost will vary according to the direct measurement method in question (for example, procuring a handheld GPS device for each interviewer, or scheduling an additional visit to each household). If the cost of adopting direct measurement is prohibitive at full scale (that is, across all enumeration areas and households), it can be limited to a subsample. A within-survey imputation approach can then be pursued, depending on the objective, to derive imputed direct estimates for the portion of the sample not subject to direct measurement [86].

2.5Improving capacity for CAPI, phone, web, and mixed-mode surveys

In the past decade, many countries have moved from paper-and-pencil interviewing (PAPI) to computer-assisted personal interviewing (CAPI) for their household survey data collection. While it represents a significant technological advancement to move from PAPI to CAPI, the halting of face-to-face surveys during the COVID-19 pandemic has revealed the need to build technical capacity and technological infrastructure for implementing phone, web, and mixed-mode surveys in many lower-income countries.

Rates of mobile phone penetration and internet use are still quite low in lower-income contexts [104, 105]. Furthermore, both phone and web surveys are more likely to result in significantly higher rates of nonresponse than surveys administered through face-to-face interviewing. Both of these issues would contribute to potential biases in the survey results. There are also some surveys, particularly those involving sensitive content, where behavioral cues are helpful and cannot be captured remotely. Given these facts, face-to-face interviewing will not be completely replaced by remote data collection in the near future. However, strengthening NSO capacity in remote data collection specifically in low- and middle-income countries is a key strategic step to ensure that phone and web surveys can be used together with their face-to-face counterparts, both to rapidly respond to data needs in the aftermath of shocks or to increase the frequency and timeliness of survey data collection during emergencies.

2.5.1Building sample frames for phone surveys

One of the biggest challenges for telephone interviewing faced by NSOs during the pandemic was the lack of contact information in the sampling frame [106]. In a compilation of national COVID-19 impact surveys maintained by the ISWGHS, only 43 percent of approximately 180 countries used a recent household survey as a sampling frame for telephone interviews; the remaining countries lacked an updated sample frame with telephone numbers [107]. Countries that did not have contact details to reach survey respondents during COVID-19 adopted various methods to obtain the phone numbers, such as through mobile phone service operators, random digit dialing, or using administrative data sources like population and electoral registries. For example, Mongolia used its newly updated household-based registry, which contains one or more phone numbers, to reach respondents sampled in the 2020 MICS Plus [108].

Countries that have long relied on computer-assisted telephone interviewing (CATI) and computer-assisted web interviewing (CAWI) for their official surveys can serve as models for maintaining an enabling survey infrastructure for remote data collection. A recent EU workshop on multi-mode data collection for labor force surveys [109] showed that the most common data collection mode for labor force surveys within the European Union was a combination of CAPI (mostly for the first round) and CATI (for following rounds). Contact information is obtained during the first round. The labor force survey in Canada uses either in-person interviews or CATI (if the phone number is available from administrative files) for the first round, and CATI and CAWI for subsequent survey rounds. Only a few countries, unsurprisingly those with comprehensive registration systems, rely exclusively only on CATI and/or CAWI for data collection.

Relying on face-to-face household surveys as sampling frames for phone surveys implemented during the COVID-19 pandemic revealed the advantages of this approach in minimizing household-level coverage and nonresponse bias, albeit with limits [110, 111]. Going forward, contact information for household members may be elicited in all future face-to-face surveys, which can in turn be used as sampling frames as well as to correct bias [46]. While longitudinal face-to-face surveys routinely collect this type of information to facilitate tracking efforts, cross-sectional surveys should also more systematically collect phone numbers. These efforts should be coupled with revisions to privacy and consent agreements with face-to-face household survey respondents, given the potential for the individuals to be contacted for other surveys at a later time. Countries should also consider collaborating with private data providers such as telecommunication service providers or research institutes with CATI experience to obtain access to data needed for building relevant sampling frames.

2.5.2Developing phone and web survey tools and protocols

Strong phone and web survey infrastructure must be coupled with required survey tools and protocols. For example, phone questionnaires need to be significantly shorter and simpler, given the challenges in keeping respondents engaged during remote data collection, issues with mobile network connectivity, and concerns about respondent fatigue. The flow of questions and visual cues for phone and web questionnaires also vary significantly from face-to-face questionnaires. Meanwhile, protocols should be established for respondent selection, incentive provision, phone and e-mail contact attempts to recruit respondents, proper consent and ethical requirements, and the formulation of scripted introductions and transitions during the interviews. These will be critical for successful survey implementation, as well as to ensure the general representativeness of the data.

When adopting a mixed-mode survey design that includes CATI and CAWI as possible options, NSOs need to decide whether the choice of CATI versus CAWI would be offered initially or at a later time. If the latter, NSOs will need to determine the initial survey mode and the number of days to be allowed before the second mode can be offered. Decisions must also be made on the number of reminders to the respondents and their frequency [112]. When moving towards a mixed-mode system, every decision should be tested, and investments made to develop a survey management system that can support the complexity of mixed-mode data collection. A strong IT system to support the mixed-mode data system is critical [113].

2.5.3Conducting more systematic analysis of mode effects

Phone surveys carried out during COVID-19 raised many concerns of quality, selectivity, and coverage, inter alia. Given increased reliance on phone and web surveys, users must understand the relative accuracy, reliability, and affordability of these surveys vis-à-vis their face-to-face counterparts, ideally through survey experiments that randomize the mode of interview for the same types of questions included in different surveys and questionnaire instruments. These experiments to discern potential survey mode effects can be conducted under the suggested program on survey methods discussed in Section 3.1.f. However, it is understood that even with identical questions, face-to-face, phone, and web survey questionnaires will exhibit generally differences in terms of length and design choices, again as part of a respondent-centric implementation approach. More information on assessing the quality of surveys carried out during the pandemic will be available in the forthcoming ISWGHS publication [114].

2.6Systematizing the collection, storage, and use of paradata and metadata

As CAPI, CATI, and CAWI become common in survey data collection, increasing amounts of paradata are being collected as a byproduct of the data collection process, including keystroke records, eye-tracking, mouse-tracking, and GPS-tracking of interviewer location [115]. These paradata can be used for a variety of purposes, including (1) computing granular interview duration statistics for specific questions, modules, interviewers, and/or subpopulations, (2) investigating question modification patterns and interviewer compliance with the intended flow of the questionnaire as part of broader survey quality control operations, (3) tracking interviewer compliance with fieldwork plans and intended visits to sample enumeration area and household locations [115], and (4) studying respondent behavior and predicting participation in the next survey wave [116]. As paradata are a byproduct of a given data collection operation, the format, layout, and content of paradata are a function of the system that generated the data and may vary greatly from one form of data collection to another [59].

The US Census Bureau uses the Performance and Data Analysis (PANDA) system, based on CAPI trace files, data files, and other case information, to assess data quality and capture falsified data [117]. Additional systems are also used more broadly to assess data quality, and paradata are collected for responses from the web infrastructure. These paradata provide insight on questions that might be confusing or cognitively difficult as well as the effectiveness of instrument design and screen layout. At Statistics Canada, various analytical studies have been carried out around data collection and processes using paradata. For example, paradata showed that more attention should be paid on the time spent between the first contact with a household and completing the interview (or confirming a nonresponse), rather than focusing on the number of calls [118]. In Statistics Austria’s recent experiment of including CAWI as one of the data collection modes for their labor force survey, paradata was used as a component for monitoring quality [119].

Available evidence on what paradata can do to improve data quality is generally scarce for lower-income countries. Moving forward, it would be informative to have more case studies on the use of paradata collected as part of CAPI and CATI systems implemented in lower-income contexts. Doing so will require strengthening NSO capacity to store, analyze, and report on paradata. This could be pursued under the proposed NSO business line on experimental statistics discussed in Section 3.1.6.

In addition, metadata are essential to collect and use for various purposes. Metadata include but are not limited to the date of interview, complementary time stamps for the start and end of interviews and survey modules (although these could also be retrieved from paradata), numerical codes for interviewers and field supervisors, identifiers for replacement households and reasons for replacement, and information on the presence of other household members during interviews with household members. Potential metadata applications can include monitoring progress and incoming data quality as well as ex-post research on interviewer effects, correlates of data quality, and seasonality in our measurements, to name a few. These efforts can also inform design of implementation plans and near-real-time data quality checks for subsequent household surveys.

2.7Expanding capacity for machine learning and artificial intelligence

Artificial intelligence, machine learning, and predictive analytics can improve the efficiency of every step of survey operations, from sampling, questionnaire design, data collection, and processing to data analysis and dissemination [120]. For example, sampling rare populations has always been a challenge for large-scale household surveys. While a common screening exercise on large samples could be prohibitively costly, classification trees and machine learning can build a prediction model with existing information from surveys and administrative data, making the sampling of rare population groups more efficient for future surveys [121]. Paradata analysis can predict the ideal time window for enumerators to contact each household, helping to improve contact rates for telephone and face-to-face interviews [122].

Machine learning can also be useful for data processing that tends to be resource-intensive and error-prone. For example, for data collected through open-ended questions (for example, on occupation, industry, and time use activities, among others), great efficiency can be gained by using machine learning to automatically code open-ended responses. The US Bureau of Labor Statistics used machine learning to automatically code responses to its open-ended work injury question, reducing the coding workload while improving the overall coding quality [123].

The use of artificial intelligence and machine learning has been central to applications discussed in Section 2.1. This includes the use of georeferenced household survey data for calibrating and validating models that combine survey data with high-resolution satellite imagery and processed geospatial data to obtain precise, high spatial resolution estimates of poverty, asset wealth, cultivated crop areas, and crop yields, among others. Other applications of machine learning in household surveys include predicting attrition rates in panel surveys [124], “fast-tracking” survey estimation and imputation procedures to speed up data dissemination efforts [125], and imputing consumer expenditures for areas that are not sampled [126] or deriving imputed direct/objective measures of outcomes when direct measurement is restricted to a subsample [86].

Applications of artificial intelligence and machine learning in household survey design, implementation, and analysis are still scattered and concentrated mostly in countries with more advanced statistical systems. Building and strengthening NSO capacity in the use of these methods should be a priority for lower-income contexts, including as part of the suggested efforts to strengthen the NSO focus on experimental statistics and survey methods. More discussion on capacity building is covered in 3.1.5.

2.8Improving data access, discoverability, and dissemination

Any improvements to household surveys must include effective strategies for documentation and dissemination to leverage the full analytical potential of collected data and maximize the return on investing in household surveys. Looking forward, data producers should aim for timely data dissemination and accelerate gains in the public availability of deidentified household survey datasets. In doing so, NSOs should communicate to the general public the importance of data for evidence-based decision making, as well as how the collection of these data will respect data protection laws and the confidentiality of personal information. NSOs should also accelerate the deposits of anonymized unit-record public use survey datasets (inclusive of spatially anonymized GPS locations of enumeration areas, as discussed in Section 2.1) into national portals and international platforms for household survey dissemination, including the International Household Survey Network [127], the World Bank Microdata Library [128], the FAO Microdata Library for Agricultural Surveys and Censuses [129], the ILO Central Data Catalogue [130], and UNHCR’s Microdata Library [131]. Key survey design information should accompany the disseminated survey microdata, such as anonymized primary sampling units, strata, and final weights.

NSOs should consider providing secure access to confidential survey microdata to promote further use and research by creating offline data enclaves that allow access to the complete set of georeferenced unit-record survey data [132]. These datasets can include confidential household GPS coordinates or georeferenced plot outlines, but they must be divorced from community, household, plot identifiers included in the microdata and can only include a very limited set of processed variables that would be used for model training and validation purposes (such as total household consumption expenditures or cultivated crop identifiers and yields). While access to microdata through physical data enclaves has been limited during COVID-19, some national statistical offices have continued this service to researchers through securely managed remote access to data [133].

The importance of disseminating real-time data or prioritizing the publication of time-sensitive data has been recognized widely. As shown during the pandemic, many countries were able to release near real-time data based on high-frequency or pulse surveys. With the help of machine learning, countries can “fast-track” survey estimation and imputation procedures to speed up data dissemination efforts (see Section 2.7).

Moreover, various data outputs should be provided by NSOs, not only in tabular format but also through analytical reports. The most useful forms of data outputs focus on specific topics and population groups (such as migrants, labor force, poverty) with integrated information from multiple sources, as opposed to data disseminated based on sources (such as a report based on a single survey, without linkages to other relevant data outputs that could be combined to provide a more comprehensive picture on a specific issue).

Disseminating information based on topics and population groups of interest requires a strong metadata system that describes the content of all microdata files, the content of aggregated tabular output, the content of analytical or descriptive reports, and the nature of specialized services provided by the agency. This kind of system enables better coordination and integration of household surveys with other data sources.

It also facilitates the production of integrated data products that are easy for users to understand, hence improving the use and usability of the data. Training NSO staff on the requisite skills for producing user-centric analytical reports and communicating with the public and journalists is also key to improving the usage of household survey data. All efforts to improve data availability must be accompanied by the development of strict data privacy and anonymization protocols.

Any dissemination programme should incorporate an appropriate system of data quality reporting built on robust quality assessment approaches. This should be part of any regular dissemination to inform the interpretation of published data and is crucial at times when new methods are introduced or important changes take place, such as changes in the mode of data collection, integration of new data sources, or changes in survey design, inter alia. Given the desired innovation and modernization of household surveys over the coming years, NSOs will need well-developed quality assessment systems leveraging existing statistical quality assurance frameworks such as the UN National Quality Assurance Framework [134].

3.Fostering a stronger enabling environment for household surveys

This section identifies the critical elements of an enabling environment at both the national and international levels to accelerate the realization of the vision described in this paper.

3.1Role of countries in creating a stronger enabling environment at the national level

3.1.1Strengthening engagement with policymakers and data users

Official statistics are collected to inform policy, promote policy discussions, and increase knowledge. As an integral part of the national data ecosystem, household surveys offer a unique opportunity to respond to the data needs of policymakers and the general public. To ensure that data collected from household surveys are relevant, policymakers and all relevant stakeholders (including marginalized population groups) should be key partners at all stages of survey planning, data collection, analysis, and dissemination. These engagements build co-ownership of data and the entire household survey process with policymakers, which in turn helps to secure financial support for household survey operations in the country.

Various ways of engaging with policymakers and key stakeholders have been adopted in countries. For example, the Canadian Statistics Advisory Council serves as a body for Statistics Canada to engage with the ministries and key experts on matters related to overall quality, including issues related to data collection, data access, privacy issues, and data dissemination. To better understand the needs from regional and local governments, the Federal-Provincial-Territorial Consultative Council on Statistical Policy collaborates with Statistics Canada to determine data requirements, consult on current statistical activities, and coordinate data dissemination [135].

Another way to connect with the public directly is through open consultation. For example, open consultation was carried out in 2020–21 by the Australian Bureau of Statistics (ABS) about the Aboriginal and Torres Strait Islander Health Survey [136]. The consultation helped ABS design a culturally appropriate approach to collect information on health with three components: survey content, biomedical tests, and data integration. The public was asked to complete an online survey and/or submit inputs through email or mail. A report of the consultation is published online. Topics for open consultations are broad, ranging from data collection as in the above example for Australia, data dissemination, or the use of data. Surveys supported by the World Bank Living Standard Measurement Study (LSMS) team establish a Technical Working Group within each country, including NSO officials, line ministries that use the survey data, think tanks, research centers, and academia. The Data Users Group advises on questionnaire design to align with policy needs and increase the use of survey data.

On the whole, strong user engagement can identify not only the data needs on the part of stakeholders but also the ways in which household surveys are used. The latter is critical for identifying the types of household survey data that are needed for ongoing country policy design and evaluation processes.

3.1.2Modernizing national statistical systems

In recent years, many NSOs have taken on the task of modernizing their statistical systems. Drivers for this transformation are both internal and external. Internal drivers may include organizational silos that prevent the reuse of knowledge and data produced under different streams of work, duplication, lack of consistency of solutions, limited interoperability across data sources, and limited capacity for research and innovation. External drivers for modernization include increasing demand for statistical information, the availability of new data sources and methods, and increasing challenges with the traditional data sources that are typically under the auspices of NSOs [137]. Attempts to introduce innovations into household surveys would benefit greatly from a modernized national statistical system. For example, a process-oriented data collection system (discussed in more details under 2.1), which is a key element of a modernized statistical system, can help ensure harmonization across surveys and facilitate data integration.

Another key aspect of a modernized statistical system is an expansive approach to new sources of data, relying on both primary data collection such as population censuses and household surveys as well as on other data sources such as administrative sources and non-traditional data sources. This transformation serves as a catalyst for innovative survey methodologies and better data integration. The shift to multi-purpose sources and multi-source statistics also ensures that the collected survey data are re-used for multiple purposes, hence increasing the value of existing surveys [13].

3.1.3Quantifying the benefits and communicate the value of surveys

National statistical offices must invest in data visualization and data journalism to better communicate the value of household surveys, both in and of themselves and through integration. This is critical for boosting the understanding of the importance of household survey data, and in turn, securing political commitment to and predictable financing for household survey programs. It is also important for NSOs to document how survey data have been used for policymaking, to further demonstrate the value of household surveys.

In an age of abundance of information and misinformation, it is important to develop and maintain a brand for NSOs that is associated with trust, relevance, independence, and quality. Consumer consultations, staff engagement, and communication and marketing strategies are key elements for building such a brand [138].

3.1.4Sustaining financing for household surveys

During the COVID-19 pandemic, 40 percent of NSOs saw data collection costs rise, while 48 percent of NSOs globally experienced decreased government funding [139]. In sub-Saharan Africa, 61 percent of countries experienced increases in data collection costs, with 71 percent seeing a decrease in government funding and 59 percent a decrease in donor funding. These challenges are being experienced by systems that were already insufficiently funded yet are required to produce accurate and timely data. An analysis carried out by the World Bank puts the average cost of conducting a face-to-face household survey, based on a sample of 18 living conditions surveys, at USD 170 per household [140]. Significant variations exist in the cost among these 18 surveys, ranging from USD 64 in Bangladesh to above USD 400 per household in Nigeria. The cost of a national MICS6 survey ranges from USD 29 to USD 370 per household [141]. As we move forward, the cost of household surveys is likely to increase, and many low-income countries will continue to lack the resources to consistently fund their national surveys.

Domestic resource mobilization remains the most sustainable funding resource for statistical advancement. At the national level, systematic funding mechanisms should be in place to support household survey operations as an integral part of the national statistical system. There must be strong statistical advocacy programs in place that can enhance the work done by NSOs for the benefit of politicians and policy makers to ensure that household surveys are priority activities in annual budgets. Demonstration of use cases may be helpful in resource mobilization. In view of ever-present budget constraints and diverse NSO operations and programs that are seemingly in competition for resources, household survey design should be tightly fit-for-purpose such that NSOs could maximize the cost-effectiveness and relevance of survey data production.

Furthermore, public-private partnerships should be encouraged by national governments as a means to drive funding toward statistical development. Special trust funds for the sole purpose of advancing statistics could be established on already existing tax revenue platforms.

The last two years dealt a major blow to the world economy and lower-income countries were hit particularly hard. Funding statistical development must continue being a priority for national governments and the international community if economies are to rely on evidence-based policy responses to emerge from the downturn.

3.1.5Strengthening the capacity of national statistical systems

Quality official statistics can only be offered by a national statistical system that has the requisite managerial and technical skills to deliver on its mandate. Training and capacity building programs can improve on existing skills and help develop new technical skills that can assist NSOs to leverage new types of data sources and new methods.

As seen in the discussion from Chapter 2, the skills mix of NSOs must diversify to increase the development impact of household surveys, expanding to areas such as the use of new data sources (e.g., earth observation data), new techniques (such as machine learning, artificial intelligence, and anonymizing survey microdata), and data integration. Research skills need to be developed to support methodological innovation, including for instance in the use of latent variable models that cater to measurement errors, and communication skills need to be improved to better engage with respondents and users. Such improvements can be undertaken together with establishing a new (or strengthening the existing) business line on experimental statistics, as discussed in Section 3.1.f and as originally called for by the World Development Report 2021: Data for Better Lives [142].

Furthermore, a training needs assessment of 15 NSOs identified additional areas in high demand but often overlooked by statistical trainings programmes, including coordination of the national statistical system, user engagement, and management as some of the key priorities [143]. These issues are particularly relevant for countries as they begin modernizing their national statistical systems (see Section 3.1. Modernizing national statistical system) and as NSOs take on new roles in an Integrated National Data System [142].

There is substantial variation in NSO capacity and in the extent to which national statistical systems provide training for their staff. National training programs range from having an established statistical training institute within or outside of NSOs (as in Brazil, Indonesia, and the Philippines), relying on a small training unit within the NSO (as in Ireland and Ethiopia), to providing only ad-hoc training or no training at all [144]. There is also significant variation on the type of trainings provided to staff as well as their supervision and oversight. Good practices from successful national statistical training programs should be shared and expanded to others.

The challenges of COVID-19 have prompted many statistical agencies, at both international and national levels, to rethink their training programs. At least 75 percent of all statistical capacity development events in 2020 were conducted online, compared with only about 5 percent in 2019, according to the United Nations Statistics Division Global Calendar of Statistical Events, which includes information from major international agencies. Given its efficacy, remote training is likely to continue, even if combined with in-person initiatives [145].

Many e-learning courses have been developed by international agencies on various topics such as phone surveys [146], small area estimation [147], and collecting data through household surveys for various SDG indicators [148]. Bringing these courses together to maximize access for NSOs and to avoid duplication of efforts is essential. The Global Network of Institutions for Statistical Training (GIST) [149] has established a hub that is intended for this purpose: the UN SDG:Learn Statistics Hub [150], which currently holds more than 70 e-learning courses on various topics from different providers, including FAO, UNICEF, UNSD, the World Bank, regional training institutes, NSOs, and others. More courses are in the process of being added, including from the DHS team. The hub also provides micro-learning materials such as brief learning videos, platforms, and blogs. Similar survey-related training materials are also being made available on the ISWGHS website [151].

3.1.6Fostering a program of experimental statistics

This position paper echoes one of the recommendations of the World Development Report 2021 [142] for NSOs to establish a business line on experimental statistics under which a more systematic approach to supporting and conducting methodological survey research can be pursued. In this context, a business line could be established for experimental statistics that use new data sources and new methods to improve methodologies and/or to provide more timely data to better meet user needs.

While experimental statistics are not disseminated in the same way to users, as they are still in the testing phase and not yet fully developed, they nonetheless serve as an avenue for organizing testing and experiments. Experiments can be carried out through a dedicated program on survey methods, under which new measurement tools can be tested and validated vis-à-vis their gold-standard counterparts, ideally through small-scale randomized survey experiments. Many of the priority areas covered in Chapter 2 above require experimentation within the national context, for example, on methods for data integration (Section 2.1), the use of objective measures (Section 2.4), the collection and use of paradata (Section 2.6), and machine learning and artificial intelligence (Section 2.7).

To build a successful experimental statistics business line, NSOs in lower income contexts will likely require technical assistance in building the internal capacity to carry out these activities. Fostering a culture of carrying out small experiments within national household survey programme is also important. Eurostat maintains a website that links to experimental statistics published by NSOs in the European Union and offers a clear example for fostering survey experimentation in countries [152]. Similar exercises can be carried out through other regional statistical organizations or at the global level.

3.1.7Investing in ICT infrastructure

In the third quarter of 2020, 25 percent of NSOs were reported to lack adequate ICT infrastructure for staff to work away from the office effectively [153]. The lack of cloud computing services for data storage and exchange as well as suitable facilities for remote training were common challenges highlighted by the UN-WB survey. Significant disparities exist in the capacities of low- and high-income countries to manage a forced work-from-home situation. Only 57 percent of low- and lower-middle-income countries were able to provide their staff with the necessary tools (such as personal computers, tablets, and monitors) to continue their work after the onset of the COVID-19 pandemic, as compared to 72 percent of upper-middle-income countries and 88 percent of high-income countries. Stronger and smarter technological infrastructure should be available at the national level for implementing mixed-mode surveys. These measures should be coupled with enhancements in data storage, data protection, the creation or strengthening of data dissemination platforms, the establishment of offline data enclaves for the use of confidential survey data (as discussed in Section 2.8), and steps to address the hardware and software needs of NSOs.

Finally, the successful integration of household survey data with other data sources generated by NSOs as well as those from the private sector requires strong legislative backing. The requisite legislation must allow access to other data sources for official statistical purposes while guarding the confidentiality of personal information in both the integrated data and the original data sources.

3.2Role of the international development community to support building a stronger enabling environment for household surveys

3.2.1Pursuing a coordinated and systematic approach to supporting national statistical offices

To unleash their full potential, household surveys must be adequately funded and positioned strategically so as to emphasize the critical role of a functional household survey program within the national data ecosystem. For example, during COVID-19, countries that already had existing household survey systems equipped with necessary contact information established were able to use them as sampling frames to rapidly launch and successfully implement phone surveys on COVID-19, taking advantage of investments in capacity building over a decade prior to the pandemic. Just so, investments are needed now to improve the responsiveness and resilience of future data collection systems.

Collaborative efforts at the regional and international level are key to a coordinated household survey program with a medium- to long-term plan at the country level, especially for countries that rely technically or financially on international agencies and the donor community. The ISWGHS should be the forum to foster such collaboration.

3.2.2Sustaining financing at the international level

Shifting the narrative from decrying the “funding gap” to discussing “investment opportunities” creates a better context for statistical development [154]. To build strong national statistical systems in low-income countries that can produce quality and timely data, donors need to think of data in terms of investments into tracking and achieving the Sustainable Development Goals (SDGs) and national development goals.

The Bern Network on Financing Data for Development is supporting a Clearinghouse – a platform that allows for assessing the state of data financing in the poorest and most fragile countries, highlighting ongoing and forthcoming projects, and providing guidance to donors on where to make investments. The function of the Clearinghouse is being augmented through the parallel establishment of a Global Data Facility (GDF) at the World Bank which will provide innovative financing mechanisms to address some of the most compelling funding data gaps identified through the Clearinghouse. By consolidating donor financing towards key priority areas, the GDF will enable greater coordination and create much needed synergies across donors so as to deliver more efficiently on the commitments of the Cape Town Global Action Plan for Sustainable Development, which called for “… [greater] participation of non-state actors in funding statistical activities through innovative funding mechanisms …”. The GDF could be a game-changer in supporting a renewed household survey agenda for the next decade by complementing government resources and leveraging other data investments, including project financing for large statistical capacity building operations.

3.2.3Fostering a stronger coalition of international agencies and countries

Implementing the priority areas identified in this paper requires a strong coalition of international agencies and countries to support such an ambitious agenda, which was also recognized by the United Nations Statistical Commission. During its 46th session, the Statistical Commission established the Inter-Secretariat Working Group on Household Surveys (ISWGHS) to foster improvement in the scope and quality of social and economic statistics as delivered through national, regional and international household survey programmes, with a focus on three areas of work:survey coordination, methodological development, and advocacy and communication [155].

Before exploring possible prescriptions for the ISWGHS, it is important to discuss its relative advantages and limitations; and how to support ISWGHS in fulfilling its mandates The ISWGHS consists of 11 international agencies and 10 member states. International agency members are responsible either for a survey programme (e.g., MICS, LSMS, 50 × 2030, etc.) or for providing regular training and technical support for household surveys in areas under their mandate. Member states either rely significantly on household survey programs for official statistics or offer technical and/or financial support to the work of the group. The ten country members are geographically representative to ensure that the need of the countries in their respective regions are taken into consideration in setting priorities.

All agency members have a mandate to support countries on household surveys, with a focus on specific thematic areas. For example, the International Labour Organization (ILO) is responsible for labor market data, the United Nations High Commissioner for Refugees (UNHCR) on forcibly displaced and stateless people, the United Nations Children’s Fund (UNICEF) for data on the wellbeing of children, UN Women for data on the empowerment of women and girls, and the World Bank for data on poverty and other dimensions of wellbeing. A collaborative group like ISWGHS is well-positioned to focus on the coordination of international survey efforts and cross-cutting methodologies. A good example on how ISWGHS members work together is the COVID-19 impact survey dashboard [156], created in May 2020 with information on surveys supported by members, which has played an important role in coordinating efforts within countries. The ISWGHS has also produced a number of cross-cutting methodologies that are of interest to all areas of work.

The other relative advantage of the ISWGHS, with its secretariat housed within the UN Statistics Division, is its close tie with the UN Statistical Commission [157], which serves the highest body of the global statistical system that brings together Chief Statisticians from all member states. Working closely with NSOs improves support for national needs as well as country adoption of international standards and methods.

More importantly, the ISWGHS has been playing an important role to amplify the impact of the tremendous amount of work undertaken by its members, including through innovative approaches, various channels including its website, webinars, blogs, and regular newsletters.

As we look towards supporting household surveys during the next decade, we suggest a number of priorities and activities that the ISWGHS should carry out to support countries in the short-term (with additional funding needs in italics). These include:


  • Assessing national needs regularly and identifying capacity building needs

  • Providing a common platform for all training materials (additional funding for IT support)

  • Coordinating activities of members in initiating innovative approaches and experimentation and fostering exchange of experiences

Methodological development

  • Developing guidelines and training materials along priority areas outlined in this position paper (additional funding for consultancies)

  • Supporting experiments on new methodologies in countries (additional funding to support small experiments)

Advocacy and communication

  • Fostering the exchange of experiences and innovative methods through webinars, small group focused discussions, and blogs (additional funding for communication)

  • Collaborating with key partners including NSOs, CSOs, regional organizations, academia, key professional associations such as the ISI-IASS (International Association of Survey Statisticians), and other scientific associations, both to stay informed of latest developments and to seek collaboration opportunities (additional funding for research/literature review if scaling up)

  • Organizing meetings and workshops at the international and regional level (additional funding for communication/technical staff if scaling up and funding to support participation of countries)


The paper was written at a time when household survey programs around the world have been suffering due to reduced funding, increased concerns over their quality, and the disruption of traditional fieldwork operations by the pandemic. At the same time, these programs have been challenged by the rise of non-traditional data sources, alongside data science skills such as machine learning that are relatively unfamiliar to national statistical offices.

However, given the proven agility of NSOs during the COVID-19 pandemic, through experimenting with new mode of data collection and new data sources, we hope that this paper will help support further innovation in countries, turning the various crises faced by household survey programs into opportunities For example, decreased response rates can motivate the establishment of strong relationships with data users and survey respondents, reduced resources can provoke the use of innovative approaches to increase survey efficiency and the integration of survey data with other sources, and the new “competing” data sources and skills can drive NSOs towards building partnerships and taking a more active role as data stewards.

Driving a cultural change in the production and dissemination of data, including through household surveys, is not an easy task and will take time. However, this should not prevent NSOs from taking every opportunity to pilot innovative approaches. While these experiments may not always succeed, they will nonetheless inform further work and help others in pushing the innovation forward. Those working at the international and regional levels must be committed to providing a platform to share knowledge and national experiences, supporting capacity building where it is most needed, and fostering a culture of experimentation and innovation.


The paper has benefited from extensive reviews by key stakeholders including national statistical offices (NSOs), line ministries within national statistical systems, researchers, civil society organizations, and regional and international agencies. An annotated outline of the paper [158] was initially presented to the UN Statistical Commission in 2021. Key aspects of the paper were also presented at various meetings and conferences including the UN Statistical Commission, the United Nations World Data Forum, the World Statistical Congress, and the International Association of Survey Statisticians (IASS) in 2021. A draft paper is then presented to the UN Statistical Commission again in 2022 for a wider country consultation.

Sincere appreciation goes to the following experts and ISWGHS members who reviewed and/or provided technical advice: Jack Gambino, Statistics Canada (former); Siti Asiah Ahmed, Department of Statistics in Malaysia; James Muwonge, Uganda Bureau of Statistics; Ian O’Sullivan, United Kingdom Office for National Statistics; Meagan M. Meuchel and Jennifer Tancreto, US Census Bureau; Sunita Kishor, DHS Programme – ICF International; Yacob Mudesir Seid, FAO; Rafael Diez de Medina and Kieran Walsh, International Labour Organization (ILO); Babatunde Abidoye, Teresa Martens, Agha Akram, Alefa Banda, Edvard Orlic and Andrey Krachkov, UNDP; Gady Saiovici, UNHCR; Mark Hereward, Attila Hancioglu, Turgay Unalan and Bo Pedersen, United Nations Children’s Fund (UNICEF); Yongyi Min, Vibeke Oestreich Nielsen and Nemi Okujagu, UN Statistics Division; Papa Seck, Jessamyn Encarnacion, Ghida Ismail, Rea Jean Tabaco, and Ramya Emandi, UN Women; Diego Zardetto, World Bank; Xavier Mancero and Andres Gutierrez Rojas, UN Economic Commission for Latin America and the Caribbean (UNECLAC); Afsaneh Yazdani, United Nations Economic and Social Commission for Asia and the Pacific (UNESCAP); and Krista Jacobs, LANDESA. The paper also benefited from inputs from Mick Couper, Alfredo L. Fort, Frauke Kreuter, Pedro Andres Montes Runau, and Monica Pratesi.



United Nations, (2019) . Mapping of SDG indicators. Report of the Inter-Secretariat Working Group on Household Surveys, background document. The exercise was carried out before the IAEG-SDGs 2020 Comprehensive Review of the global indicator framework. Revisions of the indicator framework were adopted by the UN Statistical Commission at its 51st session (


World Bank, (2021) . Updated estimates of the impact of COVID-19 on global poverty: Turning the corner on the pandemic in 2021? [Internet]. [cited 2022 May 21]. Available from:


United Nations Statistics Division and the World Bank, (2020) . Monitoring the State of Statistical Operations under the COVID-19 Pandemic, highlights from the fourth round of a global COVID-19 survey of National Statistical Offices (NSOs). Global COVID-19 survey of national statistical offices, May 2020. Available from:


Inter-Secretariat Working Group on Household Surveys, Dashboard on COVID-19 impact surveys. [Internet]. [cited 2022 May 21]. Available from:


Lohr SL, Raghunathan TE. Combining survey data with other data sources. Statist Sci [Internet]. 2017 May 1 [cited 2022 May 21]; 32(2). Available from:


Statistics Canada. Canadian income survey products [Internet]. 2014 [cited 2022 May 21]. Available from: The Canada Income Survey ask a minimum number of questions related to income since the tax records of the respondents are retrieved. This exercise involves linkage of individual records, and the respondents were informed during the interview (the practice is called “informed replacement”).


Eurostat, (2013) . The use of registers in the context of EU-SILC: challenges and opportunities. Edited by Markus Jäntti, Veli-Matti Törmälehto and Eric Marlier. Available at (August 2021). A related example is the Netherlands’ Labor Force Survey where the information on economic activity for employees is derived from the Jobs and Social Security Register.


Kim JK, Rao JNK. Combining data from two independent surveys: A model-assisted approach. Biometrika [Internet]. (2012) Mar 1; 99: (1): 85–100. Available from:


Ricciato F, Wirthmann A, Hahn M. Trusted Smart Statistics: How new data will change official statistics. Data & Policy [Internet]. (2020) ; 2: : e7. Available from:


Rancourt E. Admin-first as a statistical paradigm for Canadian official statistics: meaning, challenges and opportunities. (2019) . Proceedings of Statistics Canada Symposium 2018. Available from:


United Nations Statistics Division. Toolkit on small area estimation for SDGs. (2022) . Available from:

[12] “H.R.4174 – 115th Congress (2017–2018): Foundations for Evidence-Based Policymaking Act of 2018.” January 14, (2019) .


Ricciato F, Wirthmann A, Hahn M. Trusted Smart Statistics: How new data will change official statistics. Data & Policy [Internet]. (2020) ; 2: : e7. Available from: Nano data refers to data records at sub-individual level such as mobile phone position data at below- individual level.


Jolliffe DM, Mahler DG, Veerappan M, Kilic T, Wollburg PR. Under what conditions are data valuable for development? (2021) . World Bank Policy Research Working Paper No. 9811.


Elbers C, Jean OL, Peter L. Micro-level estimation of poverty and inequality. Econometrica. (2003) ; 71: (1): 355–364.


Tarozzi A. Calculating comparable statistics from incomparable surveys, with an application to poverty in India. Journal of Business and Economic Statistics. (2007) ; 25: (3): 314–336.


United Nations Statistics Division. Toolkit on small area estimation for SDGs. (2022) . Available from: Chile integrates administrative data from Ministerio de Desarrollo Social Y Familia with the National Socioeconomic Characterization Survey (CASEN) to produce poverty estimates for 345 communas.


Dang H, Jolliffe D, Carletto C. Data gaps, data incomparability, and data imputation: A review of poverty measurement methods for data-scarce environments. Journal of Economic Surveys [Internet]. (2019) Jul; 33: (3): 757–97. Available from: doi: 10.1111/joes.12307.


Dang HAH, Verme P. Estimating poverty for refugee populations: can cross-survey imputation methods substitute for data scarcity? SSRN Journal [Internet]. (2019) . Available from:


Dang H, Kilic T, Carletto C, Abanokova K. Poverty imputation in contexts without consumption data: A revisit with further refinements. (2021) . World Bank Policy Research Working Paper No. 9838.


Azzari G, Jain S, Jeffries G, Kilic T, Murray S. Understanding the requirements for surveys to support satellite-based crop type mapping: Evidence from sub-Saharan Africa. Remote Sensing [Internet]. (2021) Nov 23; 13: (23): 4749. Available from:


Lobell DB, Azzari G, Marshall B, Gourlay S, Jin Z, Kilic T, et al. Eyes in the sky, boots on the ground: Assessing satellite- and ground-based approaches to crop yield measurement and analysis in Gganda Amer. J. Agr. Econ. 102: : 202–219. doi: 10.1093/ajae/aaz051.


Lobell DB, Di Tommaso S, You C, Yacoubou Djima I, Burke M, Kilic T. Sight for sorghums: Comparisons of satellite- and ground-based sorghum yield estimates in Mali. Remote Sensing [Internet]. (2019) Dec 27; 12: (1): 100.


Administrative data are often not collected for statistics purposes. As such, their concepts and definitions can be different vis-à-vis household surveys. For instance, to integrate livestock registry data from the Ministry of Health Decree with its sample survey on livestock, the Italian National Statistical Institute (ISTAT) has carried out extensive assessment and testing on reconciling concepts – units of data and classifications are different between the two sources, and the coverage and updating frequency of the register were also issues that were considered carefully.


There is no international-agreed definition on citizen-generated data. CGD has been referred to as “a problem-focused type of data that can take many forms, often framed around people collaborating to collect data they need to understand and tackle a problem that affects them” ( or as “data produced by non-state actors under the active consent of citizens to tackle social issues explicitly” ( Common types of CGD may include crowdsourcing data (e.g., from a non-probabilistic web survey), community-driven data, data collected by civil society organizations, and sometimes even social media data.


Yeh C, Perez A, Driscoll A, Azzari G, Tang Z, Lobell D, et al. Using publicly available satellite imagery and deep learning to understand economic well-being in Africa. Nat Commun [Internet]. (2020) Dec; 11: (1): 2583. Available from:


Burke M, Driscoll A, Lobell D, Ermon S. Using satellite imagery to understand and promote sustainable development. Science. (2021) ; 271: : 6536, eabe8628.


Burke M, Lobell DB. Satellite-based assessment of yield variation and its determinants in smallholder African systems. Proc Natl Acad Sci USA [Internet]. (2017) Feb 28; 114: (9): 2189–94. Available from: doi: 10.1073/pnas.1616919114.


Newhouse D, Shivakumaran S, Takamatsu S, Yoshida N. How survey-to-survey imputation can fail [Internet]. The World Bank; 2014 [cited 2022 May 21]. (Policy Research Working Papers). Available from: In this particular example, imputation between household income and expenditure survey and the labour force survey in Sri Lanka failed, because of the violation of two pre-conditions: (a) that the questions in the two surveys are asked in a consistent way; and (b) that common variables of the two surveys explain a large share of the variations of the outcome indicator.


Pew Research Center, January (2018) . For weighting online opt-in samples, what matters most?


Statistics Canada. Are probability surveys bound to disappear for the production of official statistics? [Internet]. (2020) . Available from:


Pew Research Center, May (2016) . Evaluating Online Nonprobability Surveys.


Meng XL. Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election. Ann Appl Stat [Internet]. (2018) Jun 1; 12: (2). Available from:–Law/10.1214/18-AOAS1161SF.full.


The process-oriented approach has been adopted by a number of countries currently working with the ISWGHS on survey coordination, including Canada, Costa Rica, and Ireland; as well as in the United States. Work carried out under the ISWGHS task force on Developing Recommendations on a Comprehensive National Household Survey Programme. Terms of reference for the task force is available at More infor- mation will be made available once the country report is available.


Groves RM, Fowler FJ, Couper MP, Lepkowski JM, Singer E, Tourangeau R. Survey methodology. [Internet]. Hoboken: Wiley; (2013) .


Wallgren A, Wallgren B. Register-based statistics: administrative data for statistical purposes. Chichester, England? Hoboken, NJ: John Wiley & Sons Ltd; (2007) . 247. (Wiley series in survey methodology).


Beaumont JF. Are probability surveys bound to disappear for the production of official statistics? (2020) . Survey Methodology, Statistics Canada, Catalogue No. 12-001-X, Vol. 46, No. 1. Available from:


Hill CA, Biemer PP, Buskirk TD, Japec L, Kirchner A, Kolenikov S, et al., editors. Big data meets survey science: a collection of innovative methods. Hoboken, NJ: Wiley; (2021) . 753.


Microdata anonymization | IHSN [Internet]. Available from:


Inter-Secretariat Working Group on Household Surveys. Spatial anonymization, (2021) . Available from:


Hurst B. Big data and Agriculture: Innovation and Implications. Statement of the American Farm Bureau Federation to the House Committee on Agriculture. (2015) .


Tourangeau R, Plewes TJ, National Research Council (U.S.), editors. Nonresponse in social science surveys: a research agenda. Washington, D.C: National Academies Press; (2013) . 150.


De Leeuw E, Wim dH. Trends in Households Survey Nonresponse: A Longitudinal and International Comparison. Groves RM, editor. Survey nonresponse. New York: Wiley; (2002) . 500. (Wiley series in survey methodology). Declining response rate was also observed in 16 countries and 10 different surveys.


Meyer BD, Mok WKC, Sullivan JX. Household surveys in crisis. Journal of Economic Perspectives [Internet]. (2015) Nov 1 2; 29: (4): 199–226. Available from: doi: 10.1257/jep.29.4.199. Item imputation rate gradually increased during the period 1990–2013 for survey questions on receipt of transfer income in the US Current Population Survey and the Survey of Income and Program Participation.


Meyer BD, Mok WKC, Sullivan JX. The under-reporting of transfers in household surveys: its nature and consequences [Internet]. National Bureau of Economic Research; (2009) Jul. Report No.: 15181. Available from: []. Similar downward trends were documented for the American Community Survey, the Consumer Expenditure Survey and the Panel Study of Income Dynamics.


Gourlay S, Kilic T, Martuscelli A, Wollburg P, Zezza A. Viewpoint: High-frequency phone surveys on COVID-19: Good practices, open questions. Food Policy [Internet]. (2021) Dec; 105: : 102153. Available from:


United Nations. Guidelines for producing statistics on asset ownership from a gender perspective. (2019) . Available from:


Hasanbasri A, Kilic T, Koolwal G, Moylan H. “LSMS+ program: overview and recommendations for improving individual-disaggregated data on asset ownership and labor.” (2021) . World Bank: Washington, DC. Available from:


Kilic T, Moylan H, Koolwal G. Getting the (Gender-disaggregated) lay of the land: Impact of survey respondent selection on measuring land ownership and rights. World Development [Internet]. (2021) Oct; 146: : 105545. Available from:


Deininger K, Xia F, Kilic T, Moylan H. Investment impacts of gendered land rights in customary tenure systems: Substantive and methodological insights from Malawi. World Development [Internet]. (2021) Nov; 147: : 105654. Available from:


Kilic T, Van den Broeck G, Koolwal G, Moylan H. Are you being asked? Impacts of respondent selection on measuring employment [Internet]. World Bank, Washington, DC; (2020) . Available from:


Inclusive Data Taskforce recommendations report: Leaving no one behind – How can we be more inclusive in our data? [Internet]. UK Statistics Authority. [cited 2022 May 20]. Available from:


Palmer S, Stathis N. Learning from the 2016 Australian Census and ensuring effective issues management during ABS’ most challenging sensitive and divisive data collection. (2019) . Paper presented at the Conference of European Staisticians Workshop on Statistical Dissemination and Communication 12–14 June 2019, Gdansk, Poland. Available from:


Groves RM, Couper MP. Nonresponse in household interview surveys [Internet]. (2012) . Available from:


Wilson L, Dickinson E. Respondent-centred surveys: stop, listen and then design. 1st ed. Thousand Oaks: SAGE Publications; (2021) .


Contact materials and survey materials were developed and tested to ensure that they were direct and communicated the survey purpose clearly, and straightforward instructions were provided. Additionally, the survey was designed to be responsive to concerns of the public, namely with regards to privacy. Privacy and legal specialists were embedded in the data collection project to provide ongoing support, and an external prominent privacy expert was also engaged in an independent review and issued a public statement of assurance. Other efforts to reduce nonresponse taken by ABS included targeted campaigns and crisis management plans, leading ultimately to a response rate of nearly 80%. Similar exercise has been carried out at the US Census Bureau to streamline text in plain language that communicates with respondent while addressing areas of potential concern.


Young DK. Assessing how a household survey is perceived by respondents. (2019) . U.S. Bureau of Labor Statistics Office of Survey Methods Research. Available from: The US Consumer Expenditure Survey introduced a set of response burden questions at the end of the last wave of interview. Such data can be analyzed together with data collected from surveys to allow a better understanding of respondent burden and its relationship to response bias.


Franco C, Bell WR. Corrigendum to: Using american community survey data to improve estimates from smaller u. S. Surveys through bivariate small area estimation models. Journal of Survey Statistics and Methodology [Internet]. (2022) Jan 22; 10: (1): 248–248. Available from: The American Community Survey data have been used as an excellent benchmarking tool for many small surveys carried out in the United States. The survey is large, interviewing around 2.1–2.3 million housing units and covering a wide range of variables.


Kreuter F, editor. Improving surveys with paradata: analytic uses of process information [Internet]. Hoboken, New Jersey: John Wiley & Sons, Inc.; (2013) . Available from: doi: 10.1002/9781118596869.


American Association for Public Opinion Research. Address-based sampling. (2016) . Available from: In the United States, Address-based sampling frame is built on addresses provided by United States Postal Service. Contact information for face-to-face, telephone or web surveys can be acquired from vendors.


Quintslr MMM, Hypólito EB. Development of an Integrated System of Household Surveys: The Brazilian Experience. (2009) . Available from:


United Nations Statistical Commission. 51st session, item 3(j), background document. Report on the Results of the UNSD Survey on 2020 round population and housing censuses. (2020) . Available from:


Kalton G, Anderson DW. Sampling rare populations. Journal of the Royal Statistical Society Series A (General) [Internet]. (1986) ; 149: (1): 65. Available from:


Barron Ausbrooks CY, Barrett EJ, Martinez-Cosio M. Ethical issues in disaster research: Lessons from hurricane katrina. Popul Res Policy Rev [Internet]. (2009) Feb; 28: (1): 93–106. Available from: doi: 10.1007/s11113-008-9112-7.


Michael R. Frames program overview. 2021. US Census Bureau. Census scientific advisory committee – 2021 spring virtual meeting: march 18–19, 2021 [Internet]. Available from:


United Nations. The Global Statistical Geospatial Framework. (2019) . Available from:


Eckman S, Himelein K. Methods of geo-spatial sampling. 2020. From Hoogeveen J, Pape U, editors. Data collection in fragile states: innovations from Africa and beyond [Internet]. Cham: Springer International Publishing; (2020) . Available from: doi: 10.1007/978-3-030-25120-8.


Samet H. The quadtree and related hierarchical data structures. ACM Comput Surv [Internet]. (1984) Jun; 16: (2): 187–260. Available from: doi: 10.1145/356924.356930.


Minasny B, McBratney AB, Walvoort DJJ. The variance quadtree algorithm: Use for spatial sampling design. Computers & Geosciences [Internet]. (2007) Mar; 33: (3): 383–92. Available from:


Qader SH, Lefebvre V, Tatem AJ, Pape U, Jochem W, Himelein K, et al. Using gridded population and quadtree sampling units to support survey sample design in low-income settings. Int J Health Geogr [Internet]. (2020) Dec; 19: (1): 10. Available from:



French National Institute of Statistics and Economic Studies. Handbook of spatial analysis, theory and application with R. (2018) . Available from:


Feehan DM, Mahy M, Salganik MJ. The network survival method for estimating adult mortality: Evidence from a survey experiment in rwanda. Demography. (2017) Aug; 54: (4): 1503–1528. doi: 10.1007/s13524-017-0594-y. PMID: 28741073; PMCID: PMC5547188.


Me A. Collecting data on sensitive topics and on rare events through surveys. (2020) . Presentation during the United Nations Statistical Commission Friday Seminar: Household Surveys in a Changing Data Landscape. Available from:


Inter-Secretariat Working Group on Household Surveys. Sampling to leave no-one behind on Wiki. Forthcoming.


Lundquist P, Sarndal CE. Aspects of responsive design with applications to the Swedish Living condition survey. Journal of Official Statistics. (2013) ; 29: (4): 557–582.


Jahun I, Greby SM, Adesina T, Agbakwuru C, Dalhatu I, Yakubu A, et al. Lessons from rapid field implementation of an hiv population-based survey in Nigeria, 2018. JAIDS Journal of Acquired Immune Deficiency Syndromes [Internet]. (2021) Aug 1; 87: (1): S36–42. Available from:


Abay K, Barrett C, Kilic T, Moylan H, Ilukor J, Vundru Drazi W. Nonclassical measurement error and farmers’ response to information reveal behavioral anomalies. (2022) . World Bank Policy Research Working Paper.


Carletto C, Gourlay S, Murray S, Zezza A. Cheaper, faster, and more than good enough: Is GPS the new gold standard in land area measurement? Survey Research Methods. (2017) ; 11: (3): 235–265.


Carletto C, Gourlay S, Winters P. From guesstimates to gpstimates: land area measurement and implications for agricultural analysis. J Afr Econ [Internet]. (2015) Nov; 24: (5): 593–628. Available from: doi: 10.1093/jae/ejv011.


Carletto C, Savastano S, Zezza A. Fact or artifact: The impact of measurement errors on the farm size-productivity relationship. Journal of Development Economics [Internet]. (2013) Jul; 103: : 254–61. Available from:


Kilic T, Zezza A, Carletto C, Savastano S. Missing (ness) in action: Selectivity bias in GPS-based land area measurements. World Development. (2017) ; 92: : 143–157.


Desiere S, Jolliffe D. Land productivity and plot size: Is measurement error driving the inverse relationship? [Internet]. Journal of Development Economics. (2018) ; 130: (1): 84–98.


Gourlay S, Kilic T, Lobell D. A new spin on an old debate: Errors in farmer-reported production and their implications for inverse scale-productivity relationship in Uganda. Journal of Development Economics. (2019) ; 141: : 102376.


Abay KA, Abate GT, Barrett CB, Bernard T. Correlated non-classical measurement errors, ‘Second best’ policy inference, and the inverse size-productivity relationship in agriculture. Journal of Development Economics [Internet]. (2019) Jun; 139: : 171–84. Available from:


Yacoubou DI, Kilic T. Survey measurement errors and the assessment of the relationship between yields and inputs in smallholder farming systems: evidence from Mali. (2021) . World Bank Policy Research Working Paper. Available from:


Arthi V, Beegle K, De Weerdt J, Palacios-López A. Not your average job: Measuring farm labor in Tanzania. Journal of Development Economics [Internet]. (2018) Jan; 130: : 160–72. Available from:


Gaddis I, Oseni G, Palacios-Lopez A, Pieters J. Measuring farm labor: survey experimental evidence from ghana [Internet]. World Bank, Washington, DC; (2019) . Available from:


Kosmowski F, Aragaw A, Kilian A, Ambel A, Ilukor J, Yigezu B, et al. Varietal identification in household surveys: Results from three household-based methods against the benchmark of DNA fingerprinting in southern ethiopia. Ex Agric [Internet]. (2019) Jun; 55: (3): 371–85. Available from:


Hodson DP, Jaleta M, Tesfaye K, Yirga C, Beyene H, Kilian A, et al. Ethiopia’s transforming wheat landscape: Tracking variety use through DNA fingerprinting. Sci Rep [Internet]. (2020) Dec; 10: (1): 18532. Available from:


Akogun O, Dillon A, Friedman JA, Prasann A, Serneels PM. Productivity and health: physical activity as a measure of effort. The World Bank Economic Review, lhaa011. (2020) . Available from: doi: 10.1093/wber/lhaa011.


Friedman J, Gaddis I, Kilic T, Martuscelli A, Palacios-Lopez A, Zezza A. The distribution of effort: physical activity, gender roles, and bargaining power in an Agrarian setting. (2021) . World Bank Policy Research Working Paper No. 9634.


Picchioni F, Zanello G, Srinivasan CS, Wyatt AJ, Webb P. Gender, time-use, and energy expenditures in rural communities in India and Nepal. World Development [Internet]. (2020) Dec; 136: : 105137. Available from:


Srinivasan CS, Zanello G, Nkegbe P, Cherukuri R, Picchioni F, Gowdru N, et al. Drudgery reduction, physical activity and energy requirements in rural livelihoods. Economics & Human Biology [Internet]. (2020) May; 37: : 100846. Available from:


Zanello G, Srinivasan S, Nkegbe P. Piloting the use of accelerometry devices to capture energy expenditure in agricultural and rural livelihoods: protocols and findings from northern Ghana. Development Engineering. (2017) ; 2: : 114–31.


Daum T, Buchwald H, Gerlicher A, Birner R. Times have changed: Using a pictorial smartphone app to collect time-use data in rural zambia. Field Methods [Internet]. (2019) Feb; 31: (1): 3–22. Available from: doi: 10.1177/1525822X18797303.


Sugie NF. Utilizing smartphones to study disadvantaged and hard-to-reach groups. Sociological Methods & Research [Internet]. (2018) Aug; 47: (3): 458–91. Available from: doi: 10.1177/0049124115626176.


Zegras PC, Li M, Kilic T, Lozano-Gracia N, Ghorpade A, Tiberti M, et al. Assessing the representativeness of a smartphone-based household travel survey in Dar es Salaam, Tanzania. Transportation [Internet]. (2018) Mar; 45: (2): 335–63. Available from:


Statistics Netherlands. CBS experimenting with sensors. (2018) . Available from:


Ambel AA, Mugera HK, Bain RES. Accounting for drinking water quality in measuring multidimensional poverty in Ethiopia. Aguilar FX, editor. PLoS ONE [Internet]. (2020) Dec 15; 15: (12): e0243921. Available from:


Statistics Canada. The integration of web-scraped data into the clothing and footwear component of the Consumer Price Index. (2020) . Available from


Teh HY, Kempa-Liehr AW, Wang KIK. Sensor data quality: A systematic review. J Big Data [Internet]. (2020) Dec; 7: (1): 11. Available from:


Kreuter F, Haas GC, Keusch F, Bähr S, Trappmann M. Collecting survey and smartphone sensor data with an app: Opportunities and challenges around privacy and informed consent. Social Science Computer Review [Internet]. (2020) Oct; 38: (5): 533–49. Available from: doi: 10.1177/0894439318816389.


United Nations Statistics Division. Global SDG indicators database. Available from: Out of 88 countries and areas with data on mobile phone ownership (SDG indicator 5.b.1) since 2014, only 48 (55%) indicate that 80% or more of their population own a mobile phone. Only 1 out of 17 countries in sub-Saharan Africa with data for the indicator have mobile phone ownership at 80% or above. Data extracted December 2021.


United Nations Statistics Division. Sustainable Development Report 2021 Statistical Annex. Available from:–Statistical-Annex.pdf. In 2019, 51% of the world population use internet, compared to18% in sub-Saharan Africa. Data extracted December 2021 for SDG indicator 18.8.1.


International Labour Organisation. Global review of impacts of the COVID-19 pandemic on labour force surveys and dissemination of labour market statistics. (2021) . Available from:—dgreports/—stat/documents/publication/wcms_821387.pdf.


Inter-Secretariat Working Group on Household Surveys. Dashboard of COVID-19 Impact Surveys. Extracted December 2021. Available from:


United Nations Children’s Fund. Mongolia MICS Plus survey methodology. (2021) . Available from:


Lucarelli C, Martino A. EU-LFS data collection state of play and perspectives. (2021) . Labour Market Statistics (LAMAS) Working Group Workshop on multi-mode data collection.


Ambel A, McGee K, Tsegay A. Reducing bias in phone survey samples: effectiveness of reweighting techniques using face-to-face surveys as frames in four African countries. (2021) . World Bank Policy Research Working Paper No. 9676.


Brubaker J, Kilic T, Wollburg P. Representativeness of individual-level data in COVID-19 phone surveys: Findings from Sub-Saharan Africa. Van Campenhout B, editor. PLoS ONE [Internet]. (2021) Nov 17; 16: (11): e0258877. Available from:


Eurostat. Labour Market Statistics (LAMAS) Working Group Workshop on multi-mode data collection. Country presentations. (2021) .


Schouten B, van den Brakel JA, Buelens B, Giesen D, Luiten A, Meertens V. Mixed-mode official surveys: design and analysis. (2021) . Boca Raton: Chapman and Hall/CRC.


Inter-secretariat Working Group on Household Surveys. Guidance note on assessing and minimizing the COVID impact on survey quality. (2022) . Forthcoming.


US Census Bureau. Paradata. Available from:


Instituto Brasileiro de Geografia e Estatística (IBGE). Paradata as data source for census data collection monitoring: Brazilian census of agriculture case. (2019) . Fifty session of the UN Statistical Commission. Available from:


Jans M, Sirkis R, Schultheis C, Gindi R, Dahlhamer J. Comparing CAPI Trace File Data and Quality Control Reinterview Data as Methods of Maintaining Data Quality. (2011) . Proceedings of the Survey Research Methods Section, American Statistical Association (2011). Available from:


Laflamme F. Data collection research using paradata at Statistics Canada. 2009. 2008 International Methodology Symposium, Statistics Canada: Data Collection: Challenges, Achievements and New Directions. Available from:


Hartleib S, Langer V, Moser C. Implementing CAWI in the Austrian Microcensus/LFS: experiences and challenges. (2021) . Eurostat Labour Market Statistics (LAMAS) Working Group Workshop on multi-mode data collection.


Yung W, Karkimaa J, Scannapieco M, Barcarolli G, Zardetto D, Sanchez JAR, Burger J, et al. The use of machine learning in official statistics. (2018) . UNECE Machine Learning Team report. Available from:


Dutwin D. Feedback loop: using surveys to build and assess registration-based sample religious flags for survey research. 2021. Big data meets survey science: a collection of innovative methods. Hoboken, NJ: Wiley; (2021) . 753.


Wagner J. Using paradata-driven models to improve contact rates in telephone and face-to-face surveys. 2013. Kreuter F, editor. Improving surveys with paradata: analytic use of process information. Hoboken, New Jersey: John Wiley & Sons, Inc; (2013) . 1. (Wiley series in survey methodology).


Measure A. Deep neural networks for worker injury autocoding, US Bureau of Labor Statistics, (2017) . Available from:


Liu M. Using machine learning models to predict attrition in a survey panel. 2021. Hill CA, Biemer PP, Buskirk TD, Japec L, Kirchner A, Kolenikov S, et al., editors. Big data meets survey science: a collection of innovative methods. Hoboken, NJ: Wiley; (2021) . 753.


Cohen SB, Shorey JM. Artificial intelligence and machine learning derived efficiencies for large-scale survey estimation efforts. 2021. Big data meets survey science: a collection of innovative methods. Hoboken, NJ: Wiley; (2021) . 753.


Knappenberger C, Lee YA. Model-assisted state expenditure estimates. (2021) . US Bureau of Labor Statistics. United Kingdom Office for National Statistics – United Nations Economic Commission for Europe Machine Learning Group 2021 Webinar.


International household survey network. Available from:


World Bank, World Bank microdata library. Available from:


Food and Agriculture Organization of the United Nations. Food and agriculture microdata catalogue(Fam) [Internet]. Available from:


International Labour Organization. Central data catalog. Available from:


UNHCR. Microdata library. Available from:


International Household Survey Network. Dissemination of microdata files, principles, procedures and practices. (2010) . International Household Survey Network Working Paper No. 005. Available from


United Kingdom Office for National Statistics. Accessing secure research data as an accredited researcher. (2020) Oct. Available from


United Nations Statistics Division. United Nations National Quality Assurance Frameworks Manual for Official Statistics. (2019) . Available from:


Statistics Canada. Advisory groups. Available from:


Australian Bureau of Statistics. Aboriginal and Torres Strait Islander Health Survey – community and stakeholder engagement. Available from:


Falorsi PD. Modernization at Istat and the centralization of data collection. (2016) . United Nations Economic Commission for Europe. Workshop on Statistical Data Collection – Vision on Future Surveying, 2016, the Hague, Netherlands.


United Nations Economic Commission for Europe. Brand and reputation management guidelines. (2021) . Draft.


United Nations Statistics Division and the World Bank. One year into the pandemic: monitoring the state of statistical operations under COVID-19. (2021) Jun. Available from:


Kilic T, Serajuddin U, Uematsu H, Yoshida N. Costing household surveys for monitoring progress toward ending extreme poverty and boosting shared prosperity [Internet]. World Bank, Washington, DC; (2017) . Available from:


Information provided by UNICEF Multiple Indicator Cluster Surveys. February (2022) .


World Bank. World Development report: Data for Better Lives. Washington, D.C.: World Bank; (2021) . Available from:


Nielsen VO. How can we better Coordinate and Make Use of Statistical Training Resources? A Few Reflections Linked to the Work of the Global Network of Institutions on Statistical Training (GIST)’. Statistical Journal of the IAOS. (2021) ; 37: (3): 753–767. doi: 10.3233/SJI-210857.


United Nations. Sustainable Statistical Training Programs at National Statistical Offices. (2021) . Available from:


United Nations. Sustainable Development Report: investing in data to saves lives and build back better. (2021) . Available from:


World Bank. Remote training on phone surveys. (2021) . World Bank Living Standard Measurement Study Programme. Available from:


United Nations Statistics Division, United Nations Economic Commission for Latin America and Caribbean and United Nations Population Fund. eLearning course on small area estimation. Forthcoming.


Food and Agriculture Organization of the United Nations. E-learning courses on SDG Indicators under FAO custodianship. Available from:


Global Network of Institutions for Statistical Training (GIST). Available from:


Global Network of Institutions for Statistical Training (GIST). UN SDG: Learn platform. Available from:


Inter-Secretariat Working Group on Household Surveys. Available from:


Eurostat. Experimental statistics hub of the European Statistical System. Ongoing. Available from:


United Nations Statistics Division and the World Bank. Monitoring the state of statistical operations under the COVID-19 pandemic. (2020) Aug. Available from:


United Nations Economic and Social Commission for Asia and the Pacific. Asia-Pacific Stats Café Series: Financing Statistical Development. (2020) Oct. Available from:


Inter-Secretariat Working Group on Household Surveys. Terms of Reference. (2020) . Available from:


Inter-Secretariat Working Group on Household Surveys. Dashboard of COVID-19 impact surveys supported by ISWGHS members. Ongoing. Available from:


United Nations Statistical Commission. Available from:


United Nations. Statistical Commission Fifty-second session, item 3(l) on Household Surveys. Background document: Positioning Household Surveys for the Next Decade Annotated Outline. (2021) . Available from: