Feasibility of nowcasting SDG indicators: A comprehensive survey
Abstract
The 2030 Agenda and accompanying Sustainable Development Goals (SDGs) are vital in guiding national and global policy. However, many of the SDG indicators used to measure progress toward those goals suffer from long publication lags. Nowcasting has the potential to address this problem and generate more timely estimates of those indicators. This paper provides resources for achieving that potential by 1) carrying out a comprehensive nowcasting feasibility survey of all SDG indicators to assess their potential to be nowcast, and 2) performing a case study of indicator 9.4.1 to illustrate and shed light on the process of performing a nowcasting exercise. There exist 231 SDG indicators, but due to only examining Tier 1 indicators and the fact that many indicators have multiple sub-indicators, 362 indicators and sub-indicators were eventually surveyed. Of those 362, 150 were found highly likely to be suitable candidates for nowcasting, 87 were found to be likely, and 125 were found to be unsuitable.
1.Introduction
The Sustainable Development Goals (SDGs) of the 2030 Agenda were adopted by the United Nations General Assembly in 2015 in recognition of the need for an organized international framework to help address the myriad challenges facing the world in the 21st century [1]. The goals transformed considerably when compared with the 2000 Millennium Development Goals (MDGs), reflecting an increasing pace of technological, economic, and social change and applying to all countries globally instead of developing economies only. Some issues have remained timeless and appear in both the SDGs and MDGs, such as poverty, hunger, and education, while others were substantially expanded upon, such as those concerning the environment and climate change. Still many others, such as clean energy, were newly added.
Despite the ambitious aims of the 2030 Agenda, the UN recognized that its impact would be limited without proper means of measuring and quantifying progress on its goals. Consequently, the General Assembly asked the UN Statistical Commission to coordinate the substantive and technical work to develop the SDG indicator framework to measure targets selected for each goal, currently with a total of 231 indicators spread across the 17 goals [2]. While some indicators are similar or identical to existing statistics compiled and published by national statistical offices or other national authorities, such as unemployment rate, others needed to be newly defined and collected specifically for the 2030 Agenda. Furthermore, countries’ data gaps vary greatly, and they have had to put in place special efforts to enable more comprehensive reporting on the indicator framework.
In addition to filling data gaps, national statistical authorities have been challenged by increasing pressure to provide more up-to-date information as evidence for policy makers so that they have enough time to influence progress towards achieving the goals of the Agenda by 2030. Poor timeliness is a common issue for many SDG indicators [3]. Indicators are of limited use to policy makers in terms of both planning and programming assessment if they are published with significant lags. As noted in the ‘a World that Counts’ report [4], data delayed is data denied.
Recently, the rise of new technological possibilities and emerging digital data sources have enabled the compilation of timelier statistics. Numerous statistical offices have quickly responded to the demand for timely data during the COVID-19 pandemic, including with the use of non-traditional data sources and new statistical techniques. One such tool that could help address issues of timeliness in SDG indicators is nowcasting. Nowcasting is the estimation of the current, or near-current, value of a target series using information from more timely series. In a world awash in data from both a plethora of new sources and from new ways of storing old data [5], nowcasting can help leverage that information to obtain advance estimates of lower velocity indicators. As noted by MacFeely [6], while nowcasting has generally been well received, many questions regarding the robustness of the methodologies employed need to be solved, as well as concerns over the validity of using a wide variety of data sources, including both hard and soft indicators. Concerns have also been flagged about the impact of revisions in the underlying data, dissemination strategies, potential confusion for users, division of work between international and national agencies, and relevance to some areas of sustainable development. To date, timeliness of SDG indicators has been the responsibility of the many separate SDG indicator custodian agencies, some managing better than others to improve timeliness in collaboration with national statistical authorities. Furthermore, as the indicators are mainly compiled by national statistical authorities, common approaches, methods and rules are needed. To increase collaboration on nowcasting among official statisticians, the United Nations Conference on Trade and Development (UNCTAD) held a nowcasting workshop with United Nations Industrial Development Organization (UNIDO) in February 2020. The meeting discussed case studies on nowcasting exercises carried out in official statistics. In 2021, UNCTAD shared its experience on new nowcasting methodologies based on neural networks with the global statistical community at a UN Brown Bag seminar. This paper aims to help turn the potential to nowcast SDG indicators into reality by firstly providing a comprehensive survey of the nowcasting feasibility of all SDG indicators and secondly by fully documenting the process of nowcasting an SDG indicator via a case study. The nature of this content positions the paper mainly as a reference work, though it can be read from start to finish as interest dictates. This work was carried out in the context of an informal ‘nowcasting network’, chaired by UNIDO and set up by UN Chief Statisticians.
For this paper, 362 SDG indicators and sub-indicators were surveyed for nowcasting feasibility. This number differs from the 231 mentioned above due firstly to only examining the 130 Tier 1 SDG indicators, and secondly to the fact that some indicators have several sub-indicators. See Section 2.1 for more information on Tier 1 indicators. Of those 362, 150 received a classification of “Highly likely” able to be nowcast, 87 received a classification of “Likely”, and 125 received a classification of “Unlikely”. See Sections 3.1 and 3.2 for more information on survey methodology and factors determining classification. Most indicators and sub-indicators were found to be recorded at an annual frequency, with publication lags ranging between one to three years, though these lags may differ by data source, which often varies by country or region of coverage. The existence of potential explanatory variables for use in modelling is unlikely to pose a problem for most indicators. Some indicators contain sets of sub-indicators surrounding a core subject matter. These sub-indicators typically share the same data availability and publication lags, with similar sets of likely explanatory variables.
The rest of the paper proceeds as follows: Section 2 will provide further background on the SDG indicators and nowcasting; Section 3 will describe the approach taken to complete the survey and report on general findings for each SDG; Section 4 will present the empirical case study illustrating one approach to nowcasting an SDG indicator; Section 5 will conclude, summarizing main results and recommendations going forward.
2.Background
2.1SDG indicators
In 2012, as the target year for the MDGs approached, work began on the development of a post-2015 development agenda [7]. The result was the 2030 Development Agenda, adopted by the UN General Assembly on the 25th of September, 2015 [1]. In contrast to the eight goals of the MDGs, the 2030 Agenda outlined 17 goals, called the Sustainable Development Goals (SDGs), which are accompanied by a varying number of targets per goal, for a total of 169 targets [1]. Each target in turn has one or more indicators to aid in monitoring progress towards accomplishment of each target and goal, for a total of 231 indicators [2]. Each of these indicators in turn has a varying number of sub-indicators. The 2030 Agenda serves as a policy framework to help tackle such issues as the eradication of poverty, reducing inequality, and addressing climate change, among many others. For more information on specific targets and indicators, see UNSD [8] and Ritchie et al. [9].
Defining priorities and agreeing on a framework in 2015 was only part of the story, as a further two years were required to make the stated goals actionable by developing the targets and indicators for each goal [10]. Measurement is a vital aspect of the 2030 Agenda, both in terms of guiding and informing policy decisions at a national and international level, as well as in quantifying progress towards the goals. Developing indicators for such an expansive agenda with a diverse array of interconnections is no small task and highly dependent on the target and type of data available. As such, there are three tiers of SDG indicator:
• Tier 1: Indicator is conceptually clear, has an internationally established methodology and standards are available, and data are regularly produced by countries for at least 50 per cent of countries and of the population in every region where the indicator is relevant.
• Tier 2: Indicator is conceptually clear, has an internationally established methodology and standards are available, but data are not regularly produced by countries.
• Tier 3: No internationally established methodology or standards are yet available for the indicator, but methodology/standards are being (or will be) developed or tested [11].
In simpler terms, Tier 1 indicators are well defined and already being produced, Tier 2 indicators are defined, but not yet being produced, while Tier 3 indicators are still being defined. It should also be noted that even indicators classified as Tier 1 may still be plagued by issues of data coverage, availability, and/or timeliness. Assessment of progress towards the achievement of the 2030 Agenda relies heavily upon reliable, accurate, disaggregated, and timely indicators. Nowcasting, examined in the next section, has the potential to address the last of those characteristics.
2.2Nowcasting in the SDG context
The term “nowcast” itself is a portmanteau of “now” and “forecast”. Nowcasting as a term and discipline originated in meteorology in the 1980s [12] but began to appear in economic literature in the 2000s. Nowcasting in the economic sense, and in the sense relevant for SDG indicators, refers to the estimation of the current value of a target variable based on timelier data and information. The distinction with forecasting comes from the fact that estimations are produced for time periods that either have already concluded or that are currently running, as opposed to periods in the future.
The intuition or justification for nowcasting is best explained by way of example using gross domestic product (GDP). GDP is a frequently nowcast target variable due to three of its characteristics. First, GDP is often published with a significant lag due to the many data sources needed and the complex accounting and aggregation procedures necessary for its calculation. GDP figures for a given quarter or year are often published many months after the conclusion of the period, even though all economic activity measured in the eventual figure has already occurred. Second, GDP usually has a long publication history, meaning there exist sufficient observations to estimate a model on historical information. Finally, there exist numerous potential explanatory variables which are published on a much timelier basis which can be used as inputs for a nowcast. Series such as consumer price indices, industrial production indices, consumer and business confidence indices, and retail trade figures, among many others, are typically published with a significantly shorter time lag than GDP, so can be used to obtain an estimate of GDP well before final figures are made available. These characteristics, together with the salience and relevance of GDP as an indicator, have made GDP the target variable for many nowcasting applications and papers. A GDP nowcasting model could then be fit on historical data and fed the latest information of timelier indicators to obtain both an estimate of GDP months before final figures are published, as well as monitor GDP outlook during the period by rerunning the model on the latest data continuously. For examples of GDP nowcasting applications see Morgado et al. [13], Rossiter [14], or Bok et al. [15].
Nowcasting is relevant for SDG indicators because many face issues with timeliness. In order to successfully implement the 2030 Agenda for Sustainable Development, it is essential that policy makers have access to timely information as it relates to SDG indicators, a primary means of monitoring and evaluating progress and guiding policy interventions. The United Nations Statistics Division (UNSD) and UNCTAD have identified nowcasting as a key means of meeting this timeliness challenge [3]. The existing literature on nowcasting specifically as it relates to SDGs is sparse, but two notable works include Bierbaumer-Polly et al. [16], where a comprehensive nowcasting exercise of SDG indicators using dynamic factor models is performed for Austria, and Hughes et al. [17], where the International Futures forecasting system is used to nowcast many SDG-related indicators for more than 180 countries.
Nowcasting is however no panacea. It is only applicable for obtaining timelier estimates of an already produced SDG indicator. That restricts its application to Tier 1 indicators, where data are produced. For a given indicator to be suitable for nowcasting, a further two conditions need to be satisfied: the indicator needs a sufficiently long time series to be able to train a nowcasting model, and there need to exist sufficient related and timely explanatory variables. In order to assess the nowcasting feasibility of SDG indicators, these conditions were applied in carrying out the survey explained in greater detail in the next section. Complete survey results are available online at [survey link], with a visualization of results available in Appendix 1. The complete survey results will hereafter be referred to as the “full results table”.
Care should be taken if nowcasts are eventually adopted as advanced estimates of SDG indicators. Their status as data-based, quantitative estimates, liable to revision as the data outlook changes, should be clear to users, as well as when a figure has changed from a nowcast to its actual recorded value.
3.Nowcasting feasibility survey
3.1Description of methodology
The first step in nowcasting an SDG indicator is determining whether it is even applicable to the case. That is, do the characteristics of the indicator fulfill the data requirements of nowcasting outlined in the previous section. This was the goal of this feasibility survey: to provide a comprehensive overview of every Tier 1 SDG indicator and their sub-indicators and their potential to be nowcast. The results of the survey could help statisticians and custodian agencies know at a glance whether their indicators have the potential to be nowcast and provide a springboard from which to launch their own investigations. There are no hard and fast rules for applying the earlier mentioned three conditions of nowcasting, which depend rather on the indicator. A one-month lag for an economic series may be considered a short lag, while for an epidemiological series it could be considered a long one, etc. Rather, each indicator needed to be examined individually and evaluated for nowcasting suitability on a holistic basis. It is also worth mentioning that some indicators may be composed of a combination of two or more series, for instance indicator 8.4.2 (Domestic material consumption, domestic material consumption per capita, and domestic material consumption per GDP). In these cases, the indicator’s publication delay may be due to one of its constituent series, and better results may be obtained from nowcasting just this series rather than the entire indicator itself.
As Tier 2 and 3 indicators lack the publication of any historical data, they could immediately be classified as not suitable for nowcasting due to the second condition. As such, they were excluded from the survey and do not appear in the full results table. Restricted to Tier 1 indicators, the survey was conducted in the following manner: the main sources for information on Tier 1 indicators were the SDG Indicators Metadata Repository and the Global SDG Indicator Database [8, 18]. The SDG Indicators Metadata Repository ideally includes information on data characteristics relevant to nowcasting feasibility. However, the type of information included in each SDG indicator metadata file tends to vary, despite ongoing work to standardize the contents. For instance, some files do not include any information on when the data are collected or released. Some metadata files are unfinished, have missing parts, or require updating or reviewing. The SDG Indicator Database displays data for each indicator but does not always reflect the data availability described for each indicator in the respective metadata file.
Custodian agency databases also usually provide access to data for their SDG indicators. Information from these databases was used as a direct source of indicator characteristics or used to validate metadata. Data availability in the SDG Indicator Database is usually up to date with the custodian agency databases but may also be vastly different in content. The survey combines data available in the two sources if the years do not completely overlap. Otherwise, the source with the longest time series provides the information in the full results table. The SDG Indicator Database is also limited to displaying annual data. Indicators with monthly or quarterly data, for instance, are only displayed as annual. Other database sources must be used to get information about these indicators. Data availability is not easily describable for some indicators and sub-indicators. In many cases the length of time series varies greatly by country or region. This is indicated in the full results table.
For an indicator to have Tier 1 classification, data must be available for over 50% of relevant countries. There are a few indicators that can variably be classified as Tier 1 or Tier 2, depending on sub-indicator, and some sub-indicators included appear to not meet the Tier 1 requirement stated above. The number of countries or territories covered by each indicator across all possible sources is difficult to confirm due to inaccurate metadata information and inconsistencies between data sources. Many SDG indicators are also not classified at the sub-indicator level, or existing classifications have changed over time without metadata updates. Some are broadly considered Tier 1 indicators despite having sub-indicators that may not meet the Tier 1 requirements, while others are classified as either Tier 1 or Tier 2 by sub-indicator, but without specifying which sub-indicators belong to which classification.
The amount of data available for a particular indicator can vary by country, location, or aggregate grouping, so the data availability described in the full results table generally focused on the data availability of a world or global aggregate, if available. When the global aggregate did not have enough data for nowcasting purposes or a global aggregate was not available for a particular indicator, for the purposes of this survey, a general summary of the relevant countries and other aggregations was used. Information on publication lags for each indicator often had to be inferred from the data that appeared to be available and the existing metadata information.
A further consideration for nowcasting feasibility is the existence of explanatory variables for a particular indicator. Each indicator or sub-indicator was given a score of “Highly likely,” “Likely,” or “Unlikely” for this area. An indicator got a label of “Highly likely” if explanatory variables would likely be easy to find. Many SDG indicators are macroeconomic variables, such as GDP, or fall under poverty, health, education, environment or ecological topics. Variables like these are frequently modelled and existence and availability of explanatory variables is well-documented in literature. An indicator gets a label of “Likely” if potential explanatory variables may not be closely related to the behavior of the indicator or had limited data available. For instance, variables related to Official Development Assistance (ODA) or certain government spending decisions were listed in this category, since values for these indicators are generally pre-determined by government decisions, although there are a variety of socioeconomic factors that may still contribute to initial spending decisions themselves. An indicator got a label of “Unlikely” if it may prove difficult to find explanatory variables. Appropriate explanatory variables may not exist for binary outcome variables that capture whether a country or territory has enacted a certain policy or legislation or joined a certain agreement or organization, as these decisions are generally not reversed or changed once decided upon. A regional or global aggregate of such an indicator, e.g., disclosing the number of countries or economies in a region adhering to a particular policy, legislation or agreement, etc., would be more suitable, if it exists. It should be noted that classifications for existence of explanatory variables in the survey should serve only as a starting point. Definitively determining whether explanatory variables exist and selecting them for an SDG indicator can only be done with extensive research and potentially modelling, which was not feasible for this survey due to the quantity of SDG indicators and time and resource constraints.
Finally, scores for overall nowcasting feasibility were determined by all gathered information on data availability, publication lags, and explanatory variables. Each SDG indicator and sub-indicator was given a score of “Highly likely,” “Likely,” or “Unlikely” for overall feasibility. Some indicators were determined to be less feasible for nowcasting due to the nature of the subject matter being unsuitable for nowcasting purposes, such as the previously mentioned binary outcome indicators and indicators related to spending and budgetary decisions, as well as election result indicators or upper parliament appointment results. Some indicators were considered unsuitable for nowcasting if they did not have enough data, generally around 10 data points at a minimum. Finally, if an indicator is published without significant data lags, nowcasting may not be relevant for the case. As mentioned previously, what constitutes a significant lag depends on the indicator.
3.2Survey results
Out of a total of 362 Tier 1 indicators and sub-indicators considered in the nowcasting feasibility survey, 150 received a classification of “Highly likely” for nowcasting feasibility, 87 received a classification of “Likely”, and 125 received a classification of “Unlikely.” The release of data for nearly all Tier 1 indicators and sub-indicators are accompanied by a lag. Most indicators are published at an annual frequency, with a publication lag of around one to three years. Publication lags may also differ by the organization, country, or region that provides the data. In these cases, the compilation of aggregate figures may depend on the “last” country or region to provide indicator estimates, so data publication lags are usually sufficient to warrant a nowcasting approach for both individual and aggregate figures alike. The existence of explanatory variables is unlikely to pose a problem for most indicators.
Some sub-indicators under the same indicator code measure identical concepts with different units, such as “Number of deaths and missing persons attributed to disasters” and “Number of deaths and missing persons attributed to disasters per 100,000 population” under Indicator 1.5.1. Usually, sub-indicators under the same indicator code share the same nowcasting feasibility as they share the same data availability, data release schedule, and generally surround related topics with similar sets of likely explanatory variables.
As a reference, the following sections will summarize the survey results by goal in a standardized manner, considering only Tier 1 indicators. They need only be read as interest dictates; overall survey summaries and conclusions are available in Section 5. For more detailed information on the results for a particular goal, indicator, or sub-indicator, see the full results table available online at [survey link].
Goal 1: No poverty
Goal 1 contains 15 Tier 1 indicators and sub-indicators. Feasibility information for this goal was sourced from the SDG indicator database and the SDG indicator metadata repository. Of the 15 indicators, five were found to be “Highly likely” and ten were found to be “Likely” suitable for nowcasting. Importantly, all indicators were found to have at least ten years of annual data available, with publication lags sufficiently long enough to warrant consideration for nowcasting. In terms of explanatory variables, much work already exists on nowcasting poverty and poverty-related indicators at the national and regional level [19, 20, 21, 22]. As such, suitable, timely explanatory variables should exist. For instance, the World Bank’s World Development Indicators (WDI) database contains numerous series which could be used in nowcasting goal 1 indicators.
Specific observations include the fact that indicator 1.4.1’s (proportion of population living in households with access to basic services) release schedule appears inconsistent from year to year, with data released every three to five years. Additionally for indicator 1.5.1 (number of deaths, missing persons and directly affected persons attributed to disasters per 100,000 population), while natural disasters themselves may be difficult to predict, their economic and human impacts, the focus of indicator 1.5.1, remain feasible for nowcasting.
Goal 2: Zero hunger
Goal 2 contains 25 Tier 1 indicators and sub-indicators covering topics including incidence of health-related diagnoses, agriculture, and food prices. Feasibility information for this goal was sourced from the SDG indicator database, the SDG indicator metadata repository, and the Food and Agriculture Organization of the United Nations (FAO). Of the 25 indicators, 14 were found to be “Highly likely” suitable for nowcasting, three were found to be “Likely”, and eight were found to be “Unlikely”. Most indicators were found to have over ten years of observations while indicators labelled “Unlikely” all had insufficient observations for nowcasting purposes. Regarding potential explanatory variables, generally there is much timely and quality health data available for training a nowcasting model, as these topics are frequently modelled to assess policy impacts and determine strategies for management of health phenomena [22, 23, 24].
Similar to indicator 1.4.1, data for sub-indicators of 2.2.3 (prevalence of anaemia in women aged 15 to 49 years, by pregnancy status) are not released on a consistent basis from year to year, being released every three to five years. Nowcasting may still be beneficial for these sub-indicators as there will always be some publication lag. While most indicators have over one decade of annual data, the 2.c.1 sub-indicator (consumer food price index) has annual and monthly data available depending on the source used to access the data.
Goal 3: Good health and well-being
Goal 3 contains 45 Tier 1indicators and sub-indicators, covering topics including birth and death rates, incidence of health-related diagnoses, and access to health facilities. Feasibility information for this goal was sourced from the SDG indicator database, the SDG indicator metadata repository, and the World Health Organization (WHO). Of the 45 indicators, 18 were found to be “Highly likely” suitable for nowcasting, seven were found to be “Likely”, and 20 were found to be “Unlikely”. Explanatory variables for a nowcasting model around health indicators are likely to exist, given the extent to which they are already modelled and forecasted [25, 26, 27, 28, 29]. The primary reason for a Goal 3 indicator to get a label of “Unlikely” suitable for nowcasting was data availability. Many indicators did not have a suitably long time series, for instance with either only a single data point or data only every five years. Some indicators were classified as “Likely” if there was sparse data availability at the country or aggregate level. Nowcasting for specific countries or regions remains feasible for these cases.
Indicators 3.8.1 (coverage of essential health services), 3.b.2 (total net official development assistance to medical research and basic health sectors), and sub-indicator “coverage of treatment interventions for substance use disorders” of indicator 3.5.1 were classified as “Likely” as they could be considered accounting or budget-type indicators.
Goal 4: Quality education
Goal 4 contains 26 Tier 1 indicators and sub-indicators, covering topics including education completion rates, parity indices, and school resource access. Feasibility information for this goal was sourced from the Open SDG Data Hub, the United Nations Educational, Scientific and Cultural Organization (UNESCO) Institute for Statistics (UIS), SDG indicator database, and SDG indicator metadata. Of the 26 indicators, one was found to be “Highly likely” suitable for nowcasting, 15 were found to be “Likely”, and ten were found to be “Unlikely”. Most Goal 4 indicators are sparsely reported at the country or regional level, but data for some individual countries and regions may have sufficiently long publication histories for nowcasting purposes. Generally, data publication for the Goal 4 indicators varies by region. Explanatory variables for education-related indicators are likely to be widely available, as there are a variety of socioeconomic factors that impact students and school systems. Modelling is a common approach for analyzing various metrics of education [30].
Goal 5: Gender equality
Goal 5 contains eight Tier 1 indicators and sub-indicators, covering topics including child marriage, government, and employment. Feasibility information for this goal was sourced from the Inter-Parliamentary union (IPU), the UN Economic Commission for Europe (UNECE), the International Labour Organization (ILO), the SDG indicator database, and SDG indicator metadata. Of the eight indicators, two were found to be “Highly likely” suitable for nowcasting, four were found to be “Likely”, and two “Unlikely”. Explanatory variables for these indicators are generally likely to exist, as models for measures of gender equality are commonly used to track and forecast equality progress [31, 32, 33, 34, 35], as well as measure the impacts of economic shocks, policies, and other events.
The 5.5.1 sub-indicators “number of seats held by women in national parliaments”, “proportion of elected seats held by women in deliberative bodies of local government”, and “proportion of seats held by women in national parliaments” are given the feasibility classification of “Likely”, as the makeups of local and national parliaments generally only change after elections take place, and results of individual elections are usually published without significant lags. It may be more suitable to look at specific upcoming elections individually. The 5.5.1 sub-indicator “current number of seats in national parliaments” is likely unsuitable for nowcasting as it doesn’t generally change over time.
Goal 6: Clean Water and Sanitation
Goal 6 contains 41 Tier 1 indicators and sub-indicators covering topics including water law and policy, water use, and water area. Feasibility information for this goal was sourced from the WHO, the UN Environment Programme (UNEP), the SDG indicator database, and SDG indicator metadata. Of the 41 indicators, 27 were found to be “Highly likely” suitable for nowcasting, one was found to be “Likely”, and 13 were found to be “Unlikely”. Over half of the Goal 6 indicators come from 6.6.1’s sub-indicators. In general, Goal 6 includes indicators that fall under ecological or environmental topics, for which modelling is frequently used [36, 37, 38, 39, 40, 41]. Explanatory variables should be widely available for these indicators. Most indicators have around a two-year publication lag, with either several decades or only a few years of annual data available.
Similar to Indicator 1.4.1, data for Indicator 6.5.1 (degree of integrated water resources management) is not released on a consistent basis year to year, being released every 3 to 5 years. Indicator 6.a.1 (amount of water- and sanitation-related official development assistance that is part of a government-coordinated spending plan) was classified as “Likely” as it could be considered an accounting or budget-type indicator. Indicator 6.b.1 (proportion of local administrative units with established and operational policies and procedures for participation of local communities in water and sanitation management) covers laws/policies relating to water, which are likely not suitable for nowcasting due to both insufficient data availability and lack of aggregate reporting.
Goal 7: Affordable and clean energy
Goal 7 contains six Tier 1 indicators and sub-indicators, covering topics including renewable energy, electricity access, and energy intensity. Feasibility information for this goal was sourced from the SDG indicator database and SDG indicator metadata. Of the 6 indicators, five were found to be “Highly likely” suitable for nowcasting, one was found to be “Likely”, and none were found to be “Unlikely”. All indicators/sub-indicators have a one to two-year publication lag and over a decade of annual data available. In general, Goal 7 includes indicators that fall under ecological or environmental topics, for which modelling is frequently used [42, 43]. Explanatory variables should be widely available for these indicators.
Goal 8: Decent work and economic growth
Goal 8 contains 13 Tier 1 indicators and sub-indicators, covering topics including macroeconomic variables, commercial banks, and Aid for Trade. Feasibility information for this goal was sourced from the ILO, the SDG indicator database and SDG indicator metadata. Of the 13 indicators, seven were found to be “Highly likely” suitable for nowcasting, five were found to be “Likely”, and one was found to be “Unlikely”. Nowcasting is often used to analyze macroeconomic variables like those included in the Goal 8 indicators [44, 45, 46, 47]. Explanatory variables should be widely available for these indicators.
Data for the 8.4.2 (Domestic material consumption, domestic material consumption per capita, and domestic material consumption per GDP) sub-indicators has not been released since 2017, and publication lags are unknown. Nowcasting may be especially suitable for these sub-indicators given the lack of more recent data. Data for Indicators 8.5.2 (unemployment rate, by sex, age and persons with disabilities), 8.6.1 (proportion of youth (aged 15–24 years) not in education, employment or training), and 8.10.1 (number of commercial bank branches per 100,000 adults and (b) number of automated teller machines (ATMs) per 100,000 adults) are collected by individual financial regulators or statistical organizations. Data release for these indicators varies by individual data source. The 8.a.1 (Aid for Trade commitments and disbursements) sub-indicators were classified as “Likely” as they may be considered accounting or budget-type indicators. Indicator 8.10.2 (proportion of adults (15 years and older) with an account at a bank or other financial institution or with a mobile-money-service provider) is the only Goal 8 indicator labelled as “Unlikely” for nowcasting feasibility, as it did not have a suitably long time series, with only three data points available.
Goal 9: Industry, innovation and infrastructure
Goal 9 contains 19 Tier 1 indicators and sub-indicators, covering topics including macroeconomic variables, telecommunications, and carbon dioxide emissions. Feasibility information for this goal was sourced from UNCTAD, UNESCO, the Organisation for Economic Co-operation and Development (OECD), the SDG indicator database and SDG indicator metadata. Of the 19 indicators, 16 were found to be “Highly likely” suitable for nowcasting, one was found to be “Likely”, and two were found to be “Unlikely”. Nowcasting is often used to analyze macroeconomic variables like those included in the Goal 9 indicators [48, 49, 50, 51, 52]. Explanatory variables should be widely available for these indicators. Indicator 9.4.1 (CO2 emission per unit of value added) is considered in detail in Section 4 to demonstrate the process of selecting and performing a modelling exercise on a feasible SDG indicator.
The sub-indicators under Indicator 9.1.2 (passenger and freight volumes, by mode of transport) are the only Goal 9 indicators given a score of “Unlikely”, as they do not have suitably long publication histories. Data publication lags for Indicators 9.2.2 (manufacturing employment as a proportion of total employment) and 9.3.2 (proportion of small-scale industries with a loan or line of credit) vary by data source, but both aggregate values and values for select countries and regions are likely good candidates for nowcasting. Indicator 9.a.1 (total official international support (official development assistance plus other official flows) to infrastructure) is classified as “Likely” as it may be considered an accounting or budget-type indicator.
Goal 10: Reduced inequalities
Goal 10 contains 19 Tier 1 indicators and sub-indicators, covering topics including financial markets, resource flows, developing countries, and refugees. Feasibility information for this goal was sourced from the Missing Migrants Project, the World Bank, the SDG indicator database and SDG indicator metadata. Of the 19 indicators, 12 were found to be “Highly likely” suitable for nowcasting, four were found to be “Likely”, and three were found to be “Unlikely”. Publication lags for 11 indicators and sub-indicators vary by data source.
There are multiple possible sources from which to access data on Indicator 10.7.3 (number of people who died or disappeared in the process of migration towards an international destination) that differ in periodicity and publication lag, with annual data released on the SDG indicator database and data at the incident level from the Missing Migrants Project. Although annual data has around a one-year lag, nowcasting may not be useful if more recent data is available as incidents happen. The 10.b.1 (total resource flows for development, by recipient and donor countries and type of flow) sub-indicators are classified as “Likely” as they may be considered accounting or budget-type indicators. Sub-indicators under 10.c.1 (remittance costs as a proportion of the amount remitted) are reported quarterly, with over 4 years of quarterly data. All other indicators and sub-indicators aside from 10.7.3 (number of people who died or disappeared in the process of migration towards an international destination) and 10.c.1 are annual, with 12 having over ten years of annual data.
Goal 11: Sustainable cities and communities
Goal 11 contains 13 Tier 1 indicators and sub-indicators, covering topics including natural disasters and living conditions. Feasibility information for this goal was sourced from the SDG indicator database and SDG indicator metadata. Of the 13 indicators, none were found to be “Highly likely” suitable for nowcasting, ten were found to be “Likely”, and three were found to be “Unlikely”. In general, Goal 11 includes indicators that fall under ecological or environmental topics, for which modelling is frequently used [53, 54]. Explanatory variables should be widely available for these indicators.
Data publication lags for indicator 11.1.1 (proportion of urban population living in slums, informal settlements or inadequate housing) varies by individual data source. The 11.5.1 (number of deaths, missing persons and directly affected persons attributed to disasters per 100,000 population) natural disaster sub-indicators are identical to the 1.5.1 and 13.1.1 natural disaster sub-indicators. These are considered “Likely” for nowcasting feasibility. As noted for the 1.5.1 sub-indicators, while natural disasters themselves may be difficult to predict, their economic and human impacts remain feasible for nowcasting. Aside from the natural disaster sub-indicators, the only unique Tier 1 indicators under Goal 11 are Indicators 11.1.1, 11.6.2 (annual mean levels of fine particulate matter (e.g. PM2.5 and PM10) in cities (population weighted)), and 11.a.1 (number of countries that adopt and implement national disaster risk reduction strategies in line with the Sendai Framework for Disaster Risk Reduction 2015–2030), which do not have suitably long time series for nowcasting purposes.
Goal 12: Responsible consumption and production
Goal 12 contains 15 Tier 1 indicators and sub-indicators, covering topics including environmental sustainability and domestic material consumption. Feasibility information for this goal was sourced from the SDG indicator database and SDG indicator metadata. Of the 15 indicators, four were found to be “Highly likely” suitable for nowcasting, three were found to be “Likely”, and eight were found to be “Unlikely”. In general, Goal 12 includes indicators that fall under ecological or environmental topics, for which modelling is frequently used [55, 56]. Explanatory variables should be widely available for these indicators.
The 12.2.2 (domestic material consumption, domestic material consumption per capita, and domestic material consumption per GDP) sub-indicators are identical to the 8.4.2 domestic material consumption sub-indicators and are also considered “Highly likely” for nowcasting feasibility. As with the 8.4.2 sub-indicators, data has not been released since 2017, and publication lags are unknown. Nowcasting may be especially suitable for these sub-indicators given the lack of more recent data. All 12.4.1 (number of parties to international multilateral environmental agreements on hazardous waste, and other chemicals that meet their commitments and obligations in transmitting information as required by each relevant agreement) sub-indicators have only two years of data available, so are not feasible for nowcasting.
Goal 13: Climate action
Goal 13 contains 12 Tier 1 indicators and sub-indicators, covering topics including natural disasters and greenhouse gas emissions. Feasibility information for this goal was sourced from the SDG indicator database and SDG indicator metadata. Of the 12 indicators, all were found to be “Highly likely” suitable for nowcasting, the only SDG goal for which all Tier 1 indicators have this classification. All indicators have a publication lag of one to three years and over ten years of annual data. In general, Goal 13 includes indicators that fall under ecological or environmental topics, for which modelling is frequently used [51, 55, 57]. Explanatory variables should be widely available for these indicators.
The 13.1.1 (number of deaths, missing persons and directly affected persons attributed to disasters per 100,000 population) natural disaster sub-indicators are identical to the 1.5.1 and 11.5.1 natural disaster sub-indicators. These are considered “Likely” for nowcasting feasibility. As noted for the 1.5.1 sub-indicators, while natural disasters themselves may be difficult to predict, their economic and human impacts remain feasible for nowcasting. Data availability for the 13.2.2 sub-indicator “Total greenhouse gas emissions without LULUCF for non-Annex I Parties” varies widely by region and no aggregations are produced. However, there are many countries with sufficient data for nowcasting purposes.
Goal 14: Life below water
Goal 14 contains seven Tier 1 indicators and sub-indicators, covering topics including fishing, marine area use, and fishing law and policy. Feasibility information for this goal was sourced from the SDG indicator database and SDG indicator metadata. Of the seven indicators, four were found to be “Highly likely” suitable for nowcasting, none were found to be “Likely”, and three were found to be “Unlikely”. In general, Goal 14 includes indicators that fall under ecological or environmental topics, for which modelling is frequently used [58, 59, 60]. Explanatory variables should be widely available for these indicators.
Indicator 14.6.1 (degree of implementation of international instruments aiming to combat illegal, unreported and unregulated fishing) is unsuitable for nowcasting, with only two years of available data. Additionally, data for indicators 14.7.1 (sustainable fisheries as a proportion of GDP in small island developing States, least developed countries and all countries) and 14.b.1 (degree of application of a legal/regulatory/ policy/institutional framework which recognizes and protects access rights for small-scale fisheries) have biennial periodicity and biennial data release. They do not have sufficient observations for nowcasting purposes and are thus labelled “Unlikely” for nowcasting feasibility.
Goal 15: Life on land
Goal 15 contains 32 Tier 1 indicators and sub-indicators, covering topics including forests, endangered species, environmental law and policy, and ODA. Feasibility information for this goal was sourced from the SDG indicator database and SDG indicator metadata. Of the 32 indicators, five were found to be “Highly likely” suitable for nowcasting, four were found to be “Likely”, and 23 were found to be “Unlikely”. In general, Goal 15 includes indicators that fall under ecological or environmental topics, for which modelling is frequently used [61, 62]. Explanatory variables should be widely available for these indicators. Notably, over half of the Goal 15 indicators and sub-indicators are likely unsuitable for nowcasting due to insufficient observations.
For the 15.1.1 sub-indicators “Forest area” and “Forest area as a proportion of total land area”, it was unclear if there is a usual pattern to data release and if there is a consistent publication lag. The 15.a.1 and 15.b.1 (official development assistance on conservation and sustainable use of biodiversity; and (b) revenue generated and finance mobilized from biodiversity-relevant economic instruments) sub-indicators are classified as “Likely” as they may be considered accounting or budget-type indicators.
Goal 16: Peace, justice and strong institutions
Goal 16 contains 25 indicators and sub-indicators, covering topics including crime, governments, elections, and the Paris Principles. Feasibility information for this goal was sourced from the Inter-Parliamentary Union (IPU), the SDG indicator database and SDG indicator metadata. Of the 25 indicators, one was found to be “Highly likely” suitable for nowcasting, eight were found to be “Likely”, and 16 were found to be “Unlikely”. Lack of publication history is a primary factor in the unsuitability of many Goal 16 indicators.
Data availability for Indicator 16.5.2 (proportion of businesses that had at least one contact with a public official and that paid a bribe to a public official, or were asked for a bribe by those public officials during the previous 12 months) varies by individual region with generally sporadic data publishing, but sufficient data exist for select countries/regions or aggregates. Because elections data are released and information on parliaments is updated relatively quickly, nowcasting may be less applicable to the 16.7.1 (proportions of positions in national and local institutions, including (a) the legislatures; (b) the public service; and (c) the judiciary, compared to national distributions, by sex, age, persons with disabilities and population groups) sub-indicators. However, modelling election results for specific countries or regions is common. Upper chamber parliaments generally do not have elected positions, but there still may be broad social or economic factors that impact appointments to such positions as well as elections results as a whole. Annual data for these sub-indicators is updated annually for the current year, so publication lags will depend on election dates falling before or after data release. More recent data can be found by following country elections individually. Data publication for the 16.8.1 (proportion of members and voting rights of developing countries in international organizations) sub-indicators and Indicator 16.9.1 (proportion of children under 5 years of age whose births have been registered with a civil authority, by age) depend on the region or organization supplying the data. Nowcasting may be less applicable to Indicator 16.10.1 (number of verified cases of killing, kidnapping, enforced disappearance, arbitrary detention and torture of journalists, associated media personnel, trade unionists and human rights advocates in the previous 12 months) as it is published without an annual lag. It also only has one year of available data, so would be unsuitable for nowcasting given current time series availability.
Goal 17: Partnerships for the goals
Goal 17 contains 41 indicators and sub-indicators, covering topics including financial assistance to developing countries and macroeconomic variables. Feasibility information for this goal was sourced from the Instituto Nacional de Estadística (INE) of Spain, the OECD, the SDG indicator database and SDG indicator metadata. Of the 41 indicators, 13 were found to be “Highly likely” suitable for nowcasting, 15 were found to be “Likely”, and 13 were found to be “Unlikely”. Nowcasting is often used to analyze macroeconomic variables like those included in the Goal 17 indicators [63, 64, 65, 66, 67, 68]. Explanatory variables should be widely available for these indicators.
The 17.2.1 (net official development assistance, total and to least developed countries, as a proportion of the Organization for Economic Cooperation and Development (OECD) Development Assistance Committee donors’ gross national income (GNI)) ODA sub-indicators were classified as “Likely” as they may be considered accounting or budget-type indicators. The two 17.6.1 (fixed Internet broadband subscriptions per 100 inhabitants, by speed) sub-indicators and Indicator 17.8.1 (proportion of individuals using the Internet) are related to global internet access. Data release for these indicators varies at the country or regional level. Indicator 17.18.2 (number of countries that have national statistical legislation that complies with the Fundamental Principles of Official Statistics), the 17.18.3 (number of countries with a national statistical plan that is fully funded and under implementation, by source of funding) sub-indicators, and the sub-indicators “countries that have conducted at least one population and housing census in the last 10 years,” “countries with birth registration data that are at least 90 percent complete,” and “countries with death registration data that are at least 75 percent complete” under 17.19.2 are all binary variables that have insufficient data publication histories for nowcasting purposes. 2020 data is the most recently available data for the 17.18.3 sub-indicators, but there is insufficient information to determine data release schedules and usual publication lags, if they exist.
4.Empirical pilot
4.1Indicator and data
The case study presented here will demonstrate how the information from the feasibility survey can be used to select promising SDG indicators for nowcasting and provide guidance on how the modelling exercise can then be performed. The selected indicator for the exercise was indicator 9.4.1, CO2 emissions per unit of value added [69], on the global level. As with most indicators, there exist many different regional and country-level aggregations, each with their own publication schedules and data availability characteristics. The global level was chosen as the aggregation of broadest interest. Examining the full survey results, we can see that 9.4.1 has data from 2000 onwards, thus satisfying the adequate series history requirement of nowcasting, is published on the SDG database with a multi-year lag, satisfying the requirement of an extended publication lag, and that there exist ample potential explanatory variables. These characteristics together make it a good candidate for nowcasting and for the case study.
Within the SDG indicator, the series EN_ATM_CO2GDP, “carbon dioxide emissions per unit of GDP (kilogrammes of CO2 per constant 2017 United States dollars)” was used as the nowcasting target variable [18]. Data for this series were obtained from the UN SDG database [18], where, at the time of writing in Autumn 2021, the latest global figures were available for 2000–2018 at an annual frequency. Data for the database in turn come from the International Energy Agency (IEA) [70]. It should be noted that while data available on the SDG database would imply a publication lag of more than two and a half years, timelier data may be available from the IEA directly. However, at the time of writing, timelier figures for the indicator are not available publicly. Any custodian agency, institution, or individual interested in nowcasting an SDG indicator should first make sure that timelier data are not available directly from the original data provider. For the purposes of this case study, where the goal is not to generate a nowcast of the indicator per se, but rather to illustrate and outline the modelling process, we can take the publicly available publication lag of two years as given.
With data for the target indicator in hand, the next step in the nowcasting process is identifying and gathering data for potential explanatory variables. The actual variables identified depend highly on which indicator is being nowcast. It is recommended to gather as many potentially related variables as possible. A process for selecting which variables go into the model in the end will be outlined in the next section. In the case of 9.4.1, there were two main components of the indicator: carbon emissions and economic activity, i.e., GDP. The data gathering process could then be guided by these two components.
Data on emissions were mainly gathered from two sources, Statista and the U.S. Energy Information Administration [71, 72]. Data from the former had a publication lag of between four and eight months on annual-frequency data, while data from the latter had a publication lag of three months on monthly-frequency data. Data relating to GDP and economic activity were drawn from many sources, but mainly from the OECD [73] and Eurostat [74]. Economic data had any of a monthly, quarterly, or yearly frequency. In the end, almost 30 emissions-related and more than 150 economy-related variables were gathered. Not all these variables were used in the final model, they rather served as a pool from which to train and test different models in the model selection process outlined in the next section. All series were finally transformed to period over period seasonally adjusted (if applicable) growth rates with the US Census Bureau’s X13-ARIMA-SEATS methodology [75].
4.2Methodology
Once the data have been gathered, three steps remain in the modeling process: selecting a modelling methodology, selecting which variables will go into the model, and selecting which hyperparameters to use for the chosen methodology. The last step depends on the type of model chosen, as some approaches do not have hyperparameters.
Nowcasting comes with its own set of challenges that any modelling approach must be able to handle. First, the model should be able to handle time series. Second, it should have some mechanism for dealing with mixed-frequency data. This refers to the fact that the variables in the model, be it the target variable or input variables, will not necessarily be recorded in the same frequency, for instance estimating a yearly variable using monthly and quarterly variables. Third, it should be robust to the differing publication schedules of its input variables, often called “ragged-edges”. The last challenge is the “curse of dimensionality”, where there may be few observations relative to the number of input variables, complicating the estimation of many classical econometric and statistical models.
Several different methodologies address these challenges and have been used successfully for nowcasting applications. Some of the most common include the dynamic factor model (DFM) [76, 77], mixed data sampling (MIDAS) [45, 78], mixed data sampling vector autoregression (VAR) [78], and Bayesian vector autoregression [79]. Hopp [80] examined an approach using long short-term memory artificial neural networks (LSTM).
No one approach is better than the others in all cases. Ideally, multiple would be tried in order to validate performance and increase the chances of obtaining a high-performing model. In practice, the methodology chosen will be influenced by other factors, such as which implementations are available in which programming languages, if any open-source options are available at all. It is primarily for this latter reason that the LSTM was chosen for this case study. In selecting a methodology, the nowcasting_benchmark open source repository is a good resource outlining the performance of all common nowcasting methodologies in nowcasting US GDP growth [81]. It additionally contains boilerplate code for each methodology which can be followed to perform one’s own nowcasting exercise.
Having settled on the LSTM for the nowcasting model, the next steps are selecting which variables will go into the model and which hyperparameters will be used. If the number of input variables gathered in the data collection phase is small, the former may not be necessary. In order to accomplish this, a performance metric must first be determined to compare different models to each other. In a regression application such as this, mean absolute error (MAE) and/or root-mean-square error (RMSE) are suitable. For a given model with given input variables and hyperparameters, its accuracy needs to be assessed via the performance metric. In order to ensure that a model is generalizable and not overfit, it should be assessed on data it was not trained on. A general rule of thumb is to train on 80 per cent of the data and test on 20 per cent. In this case, models were trained on data ranging from 2001 to 2011 and validated on data from 2012–2014 (the validation period). They were then trained on data ranging from 2001–2014 and tested on data from 2015–2018 (the test period).
The logic behind the validation set is its use as a way of selecting variables and hyperparameters. Selecting the best performing models based on validation performance then finally assessing them on the test set ensures that the model is not being overfit on the test set in terms of its variables and hyperparameters.
A final factor to keep in mind in nowcasting is performance before all data are available. This is commonly the case in nowcasting, and especially so if the nowcast is to be monitored over a period of time. To account for this, model performance was recorded on synthetic data vintages, or the artificial introduction of missing values to simulate the data as it would have appeared at different points in time. The vintages corresponded to the month of the target period, i.e., if the target period was 2020, the data as it would have appeared in December 2020, six months after the target period, and ten months after the target period, when the latest publishing input variable would be released. Publication lags for generating the synthetic data vintages were gathered from empirical observation from April 2020 to October 2021.
Now with a specified process of training and testing a particular model, many models could be tested to determine the best performing one. Variable selection and a small degree of hyperparameter tuning were carried out in the same step for the case study. Because there were far too many potential input variables to test all permutations, input variables were randomly sampled, run with a small selection of hyperparameters, and their performance recorded. This process was repeated for hundreds of random input variable samples. The three best performing models of these runs were then assessed with a more expansive set of hyperparameters. Finally, the best performing of these was selected for the final model. There are other approaches to variable selection than the process described here, which will not lead to the absolute best performing input variable and hyperparameter combination possible from the data. This best performing combination is impractical to find due to computational and time constraints. However, this approach is sure to find a relatively well-performing model from the space of all possible input variable and hyperparameter combinations. The LSTM library used for this analysis additionally enables the automatic selection of variables for a given model via its variable_selection function [82].
Results from the final selected model are presented in the next section.
4.3Results
The variables selected for the final model are listed in Table 1.
Table 1
Variable | Geography | Frequency | Source |
---|---|---|---|
Construction index | France | Monthly | OECD |
Consumer confidence index | Japan | Monthly | OECD |
Goods volume transported by main ports | Netherlands | Quarterly | Eurostat |
Manufacturing order books | Germany | Monthly | OECD |
Merchandise exports | Singapore | Monthly | Singapore DOS |
Merchandise exports | South Africa | Monthly | OECD |
Real GDP forecast | OECD | Quarterly | OECD |
Total energy consumed by transportation | USA | Monthly | EIA |
Tourist arrivals | France | Monthly | Eurostat |
Figure 1 shows the model’s predictions for both the validation and test sets with full data compared with observed actuals. With these variables, full data is equivalent to about seven months after the period has ended, or July of the following year, when the variable with the longest lag is published. The blue line represents predictions on the validation set, so using a model trained with data from 2001 to only 2011. The red line represents predictions on the test set, with a model trained with data from 2001 to 2014. The model is remarkably accurate in the first two years of the validation set, 2012 and 2013, but struggles with the abnormally low observed 2014 level. The large drop in that year may have been due to abnormally warm winter weather in certain regions, reducing energy consumption, coupled with a reduction of coal use in China [83, 84]. This suggests that inclusion of variables related to weather and fossil fuel use or prices could improve the performance of the model. Overestimation of 2014’s value was common to all models, perhaps as the first year where carbon intensity of GDP levels began to decline at faster rates than previously observed in the ten years prior. However, the model was able to pick up on this faster declining trend between 2015 and 2017, as well as a relatively milder decline in 2018.
MAE and RMSE for the validation period were 0.011 and 0.024, respectively. In words, this means the model predicted year-over-year growth in the target variable that was 1.1 percentage points different from the actual over the validation period. The higher RMSE value shows how this particular performance metric punishes the larger error in 2014. MAE and RMSE for the test period were 0.006 and 0.007, respectively.
Figure 1.
Figure 2.
Figure 2 shows the development of 2015 and 2020’s nowcasts over time, beginning in January of the target year, and ending in July of the following year. Predictions over time were gotten from running the model on synthetic data vintages based on the publication lags of the variables, or how the data would have looked at different points over the year. The y axis shows the nowcasted yearly growth rate, while the x axis shows the simulated date. Each point in the lines represents the nowcast for the yearly growth rate given the data that would have been available at that time. For 2015, we can observe a relatively monotonic development throughout the year, as the forecast was revised downwards as time went on. 2015 indeed registered the largest year over year decline in the target variable in the data; -3.3 per cent. In 2020, we can observe a development that changed directions over time, due to the volatile signals in the data owing to the COVID-19 pandemic. In the end, the model predicted reductions in line with the average for the period from 2014 to 2018, due to the fact that the pandemic affected both carbon emissions as well as economic activity. Figure 2 illustrates how nowcasting can be used to monitor the development of SDG indicators in real-time and gain insight to how various factors influence them.
5.Conclusion
Out of 362 Tier 1 SDG indicators and sub-indicators, we found the majority suitable for nowcasting purposes based on information available as of Autumn 2021. More specifically, 150 indicators were classified as “Highly likely” able to be nowcast, 87 were classified as “Likely”, and 125 were classified as “Unlikely”. See Appendix 1 for a visual overview of the nowcasting feasibility of sub-indicators by goal.
The case study conducted on indicator 9.4.1 illustrates the full process of nowcasting an SDG indicator. While the particular approach taken can vary considerably from that presented here, especially as it relates to the methodology employed, it can serve as a basis or starting point for those new to the practice.
While a large number of SDG indicators were considered highly feasible for nowcasting, about half of them were not. The paper only provides a first indication of the potential to investigate nowcasting feasibility. Nowcasting will not solve timeliness and data availability issues.
National efforts guided by the global statistical community and indicator custodian agencies are crucial to improving the availability of statistical data of sufficient quality, including time series length. For instance, work invested in backcasting time series to enable nowcasting and other efforts to increase the quality of SDG indicator data would also benefit nowcasting. Finally, it is the quality and availability of national statistical data that determines possibilities for nowcasting. One of the findings of this survey is that if and as policy makers require timelier data, more investment in official statistics, their quality and their comprehensiveness is needed.
As 2030 ticks nearer and the world looks to the challenges ahead, the SDG indicators will continue to be called upon for guidance. Nowcasting has the potential to increase the indicators’ timeliness, and thus their usefulness. The survey and case study conducted in this paper hopefully contribute to crystallizing that potential by serving as resources for custodian agencies, national governments, or interested individuals in carrying out their own nowcasting exercises.
Acknowledgments
The authors would like to thank Bojan Nastav, Katalin Bokor, and Nour Barnat for their valuable comments and feedback.
References
[1] | UN. Transforming our world: The 2030 agenda for sustainable development [Internet]. Transforming our world: The 2030 Agenda for Sustainable Development. (2015) [cited 2021 Feb 20]. Available from: https://sdgs.un.org/2030agenda. |
[2] | UNDESA. SDG indicators [Internet]. SDG Indicators. (2021) [cited 2021 Nov 23]. Available from: https://unstats.un.org/sdgs/indicators/indicators-list/. |
[3] | UNSD. Nowcasting and forecasting for SDG monitoring [Internet]. Nowcasting and Forecasting for SDG Monitoring; (2020) Feb 3 [cited 2021 Feb 25]; Geneva. Available from: https://unstats.un.org/unsd/statcom/51st-session/side-events/20200302-2L-Nowcasting-and-Forecasting-for-SDG-Monitoring/. |
[4] | United Nations. A World that Counts: Mobilising the Data Revolution for Sustainable Development [Internet]. New York; (2014) Jan [cited 2022 Jan 11]. Available from: https://www.undatarevolution.org/wp-content/uploads/2014/11/A-World-That-Counts.pdf. |
[5] | Einav L, Levin J. The data revolution and economic analysis. Innovation Policy and the Economy. (2014) ; 14: : 1-24. |
[6] | MacFeely S. Nowcasting: Data delayed is data denied. Statistical Journal of the IAOS. (2021) Jan; 37: : 257-258. |
[7] | UNEP. UN secretary-general appoints high-level panel on post-2015 development agenda [Internet]. UN Secretary-General Appoints High-Level Panel on Post-2015 Development Agenda. (2012) [cited 2021 Sep 20]. Available from: https://www.unep.org/news-and-stories/press-release/un-secretary-general-appoints-high-level-panel-post-2015-development. |
[8] | UNSD. SDG indicators: Metadata repository [Internet]. SDG Indicators: Metadata repository. (2021) [cited 2021 Sep 20]. Available from: https://unstats.un.org/sdgs/metadata/. |
[9] | Ritchie H, Roser M, Mispy J, Ortiz-Ospina E. Measuring progress towards the Sustainable Development Goals [Internet]. Measuring progress towards the Sustainable Development Goals. (2018) [cited 2021 Sep 20]. Available from: https://sdg-tracker.org/. |
[10] | UN. General assembly resolution 71/313, work of the statistical commission pertaining to the 2030 Agenda for Sustainable Development [Internet]. General Assembly Resolution 71/313, Work of the Statistical Commission pertaining to the 2030 Agenda for Sustainable Development. (2017) [cited 2021 Sep 20]. Available from: http://ggim.un.org/documents/a_res_71_313.pdf. |
[11] | UNSD. Tier classification for global SDG indicators [Internet]. Tier Classification for Global SDG Indicators. (2021) [cited 2021 Sep 17]. Available from: https://unstats.un.org/sdgs/iaeg-sdgs/tier-classification/. |
[12] | WMO. Guidelines for nowcasting techniques [Internet]. WMO; (2017) [cited 2021 Mar 1]. (WMO). Report No.: 1198. Available from: https://library.wmo.int/doc_num.php?explnum_id=3795. |
[13] | Morgado AJ, Nunes LC, Salvado S. Nowcasting an economic aggregate with disaggregate dynamic factors: An application to Portuguese GDP [Internet]. Gabinete de Estratgia e Estudos, Ministrio da Economia; (2007) Feb. Report No.: 0002. Available from: https://ideas.repec.org/p/mde/wpaper/0002.html. |
[14] | Rossiter J. Nowcasting the global economy [Internet]. Bank of Canada; (2010) . Report No.: 2010-2012. Available from: https://ssrn.com/abstract=1674952. |
[15] | Bok B, Caratelli D, Giannone D, Sbordone AM, Tambalotti A. Macroeconomic nowcasting and forecasting with big data. Annual Review of Economics. (2018) ; 10: (1): 615-643. |
[16] | Bierbaumer-Polly J, Bilek-Steindl S, Url T. Monitoring and nowcasting sustainable development goals. A Case Study for Austria [Internet]. 2019/389-1/S/WIFO project no: 4816: Austrian Institute of Economic Research; (2019) Nov [cited 2021 Oct 11]. Report No.: Grant No 17404. Available from: 2019/389-1/S/WIFO project no: 4816. |
[17] | Hughes BB, Irfan MT, Solrzano J, Yang V, Moyer JD. Estimating current values of sustainable development goal indicators using an integrated assessment modeling platform: Nowcasting with international futures. Statistical Journal of the IAOS. (2021) ; 37: (1): 293-307. |
[18] | UNDESA. SDG indicators database [Internet]. SDG Indicators Database. (2021) [cited 2021 Oct 1]. Available from: https://unstats.un.org/sdgs/UNSDG/IndDatabasePage. |
[19] | Aguilar RAC, Mahler DG, Newhouse D. Nowcasting global poverty [Internet]. Washington, D.C.: World Bank; (2019) [cited 2021 Nov 9]. (Paper Prepared for the IARIW-World Bank Conference). Available from: https://unctad.org/system/files/official-document/NowcastingGlobalPoverty.pdf. |
[20] | Makdissi P. Nowcasting multidimensional poverty in the occupied Palestinian territory [Internet]. ESCWA; (2021) Jul [cited 2021 Nov 8]. Available from: https://www.un.org/unispal/wp-content/uploads/2021/07/ESCWARPT_020721.pdf. |
[21] | Navicke J, Rastrigina O, Sutherland H. Nowcasting indicators of poverty risk in the European Union: A microsimulation approach. Social Indicators Research. (2014) ; 119: (1): 101-119. |
[22] | Browne C, Matteson DS, McBride L, Hu L, Liu Y, Sun Y, et al. Multivariate random forest prediction of poverty and malnutrition prevalence. PLOS ONE. (2021) Sep; 16: (9): 1-23. |
[23] | FAO, IFAD, UNICEF, WFP, WHO. The state of food security and nutrition in the world 2021 [Internet]. Rome; (2021) [cited 2021 Nov 8]. (The State of Food Security and Nutrition in the World (SOFI)). Report No.: 978-92-5-134325-8. Available from: https://www.fao.org/documents/card/en/c/cb4474en. |
[24] | Kim J, Cha M, Lee JG. Nowcasting commodity prices using social media. PeerJ Comput Sci. (2017) ; 3: : e126. |
[25] | van de Kassteele J, Eilers PHC, Wallinga J. Nowcasting the number of new symptomatic cases during infectious disease outbreaks using constrained P-spline smoothing. Epidemiology [Internet]. (2019) ; 30: (5). Available from: https://journals.lww.com/epidem/Fulltext/2019/09000/Nowcasting_the_Number_of_New_Symptomatic_Cases.16.aspx. |
[26] | Spreco A, Eriksson O, Dahlstrm, Cowling BJ, Timpka T. Evaluation of nowcasting for detecting and predicting local influenza epidemics, Sweden, 2009-2014. Emerg Infect Dis. (2018) Oct; 24: (10): 1868-1873. |
[27] | Nsoesie EO, Oladeji O, Abah ASA, Ndeffo-Mbah ML. Nowcasting influenza-like illness trends in cameroon. medRxiv [Internet]. (2020) ; Available from: https://www.medrxiv.org/content/early/2020/07/04/2020.07.02.20145250. |
[28] | Johansson MA, Powers AM, Pesik N, Cohen NJ, Staples JE. Nowcasting the spread of chikungunya virus in the Americas. PLOS ONE. (2014) Aug; 9: (8): 1-8. |
[29] | Wu JT, Leung K, Leung GM. Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: A modelling study. Lancet. (2020) Feb 29; 395: (10225): 689-697. |
[30] | Friedman J, York H, Graetz N, Woyczynski L, Whisnant J, Hay SI, et al. Measuring and forecasting progress towards the education-related SDG targets. Nature. (2020) ; 580: (7805): 636-639. |
[31] | Doorley K, O’Donoghue C, Sologon D. The gender gap in income and the COVID-19 pandemic. IZA – Institute of Labor Economics; (2021) May. (IZA Discussion Papers Series; vol. IZA DP No. 14360). Report No.: 14360. |
[32] | Chen T-L. Forecasting gender parity in higher education system and labor market in Japan and Korea. International Journal of Social Science and Humanity. (2020) Nov; 10: : 96-100. |
[33] | Rodrguez-Rodrguez I, Rodrguez J-V, Pardo-Quiles D-J, Heras-Gonzlez P, Chatzigiannakis I. Modeling and forecasting gender-based violence through machine learning techniques. Applied Sciences [Internet]. (2020) ; 10: (22). Available from: https://www.mdpi.com/2076-3417/10/22/8244. |
[34] | Ahrens P, van der Vleuten A. Fish fingers and measles? Assessing complex gender equality in the scenarios for the future of Europe. JCMS: Journal of Common Market Studies. (2020) ; 58: (2): 292-308. |
[35] | Fatehkia M, Kashyap R, Weber I. Using Facebook ad data to track the global digital gender gap. World Development. (2018) ; 107: : 189-209. |
[36] | Searcy RT, Taggart M, Gold M, Boehm AB. Implementation of an automated beach water quality nowcast system at ten California oceanic beaches. Journal of Environmental Management. (2018) ; 223: : 633-643. |
[37] | Dada A. Seeing is predicting: Water clarity-based nowcast models for E. coli prediction in surface water. Global Journal of Health Science. (2019) Feb; 11. |
[38] | Seed AW, Pierce CE, Norman K. Formulation and evaluation of a scale decomposition-based stochastic precipitation nowcast scheme. Water Resources Research. (2013) ; 49: (10): 6624-6641. |
[39] | Proisy C, Degenne P, Anthony EJ, Berger U, Blanchard E, Fromard F, et al. A multiscale simulation approach for linking mangrove dynamics to coastal processes using remote sensing observations. Journal of Coastal Research. (2016) ; 75: (10075): 810-814. |
[40] | Bouget V, Brziat D, Brajard J, Charantonis A, Filoche A. Fusion of rain radar images and wind forecasts in a deep learning model applied to rain nowcasting. Remote Sensing [Internet]. (2021) ; 13: (2). Available from: https://www.mdpi.com/2072-4292/13/2/246. |
[41] | Reimer JR, Wu CH. Development and application of a nowcast and forecast system tool for planning and managing a river chain of lakes. Water Resources Management. (2016) ; 30: (4): 1375-1393. |
[42] | Cheng H-Y. Cloud tracking using clusters of feature points for accurate solar irradiance nowcasting. Renewable Energy. (2017) ; 104: : 281-289. |
[43] | Samu R, Calais M, Shafiullah G, Moghbel M, Shoeb MA, Carter C. Advantages and barriers of applying solar nowcasting in controlling microgrids: Findings from a survey in 2020. in: 2020 International Conference on Smart Grids and Energy Systems (SGES). (2020) ; 267-272. |
[44] | Siliverstovs B. Assessing nowcast accuracy of US GDP growth in real time: The role of booms and busts. Empirical Economics. (2020) ; 58: (1): 7-27. |
[45] | Marcellino M, Schumacher C. Factor MIDAS for nowcasting and forecasting with ragged-edge data: A model comparison for German GDP*. Oxford Bulletin of Economics and Statistics. (2010) ; 72: (4): 518-550. |
[46] | Pavlicek J, Kristoufek L. Nowcasting unemployment rates with google searches: Evidence from the visegrad group countries. PLOS ONE. Hidalgo CAE, ed. (2015) May; 10: (5): e0127084. |
[47] | Caperna G, Colagrossi M, Geraci A, Mazzarella G. Googling unemployment during the pandemic: Inference and nowcast using search data. (2020) ; (KJ-AE-20-008-EN-N (online)). |
[48] | Sanyal A, Das A. Nowcasting sales growth of manufacturing companies in India. Applied Economics. (2018) ; 50: (5): 510-526. |
[49] | Boudt K, Todorov V, Upadhyaya S. Nowcasting manufacturing value added for cross-country comparison. Statistical Journal of the IAOS. (2009) ; 26: (1–2): 15-20. |
[50] | Hussain F, Hayder K, Rehman M. Nowcasting LSM growth in Pakistan [Internet]. State Bank of Pakistan, Research Department; (2018) May. Report No.: 98. Available from: https://ideas.repec.org/p/sbp/wpaper/98.html. |
[51] | Wiebe KS, Yamano N. Nowcasting OECD indicators of carbon emissions embodied in international trade. In (2015) . |
[52] | Arslanalp S, Marini M, Tumbarello P. IMF working paper: Big data on vessel traffic: Nowcasting trade flows in real time. IMF Working Papers. (2019) ; 2019: (275): A001. |
[53] | Kosmopoulos PG, Kazadzis S, El-Askary H, Taylor M, Gkikas A, Proestakis E, et al. Earth-observation-based estimation and forecasting of particulate matter impact on solar energy in egypt. Remote Sensing [Internet]. (2018) ; 10: (12). Available from: https://www.mdpi.com/2072-4292/10/12/1870. |
[54] | Yu R, Liu XC, Larson T, Wang Y. Coherent approach for modeling and nowcasting hourly near-road Black Carbon concentrations in Seattle, Washington. Transportation Research Part D: Transport and Environment. (2015) ; 34: : 104-115. |
[55] | Bennedsen M, Hillebrand E, Koopman SJ. Modeling, forecasting, and nowcasting U.S. CO2 emissions using many macroeconomic predictors. Energy Economics. (2021) ; 96: : 105118. |
[56] | Videnova I, Nedialkov D, Dimitrova M, Popova S. Neural networks for air pollution nowcasting. Applied Artificial Intelligence. (2006) ; 20: (6): 493-506. |
[57] | Lamboll R, Forster P, Jones C, Skeie R, Fiedler S, Samset B, et al. Modifying emissions data and projections to incorporate the effects of lockdown in climate modelling. in: EGU General Assembly Conference Abstracts. (2021) ; EGU21-42. (EGU General Assembly Conference Abstracts). |
[58] | Hobday AJ, Hartog JR, Spillman CM, Alves O. Seasonal forecasting of tuna habitat for dynamic spatial management. Canadian Journal of Fisheries and Aquatic Sciences. (2011) ; 68: (5): 898-911. |
[59] | Kohut J, Palamara L, Bochenek E, Jensen O, Manderson J, Oliver M, et al. Using ocean observing systems and local ecological knowledge to nowcast butterfish bycatch events in the Mid-Atlantic Bight longfin squid fishery. in: 2012 Oceans. (2012) ; 1-6. |
[60] | Carter DW, Crosson S, Liese C. Nowcasting intraseasonal recreational fishing harvest with internet search volume. PLOS ONE. (2015) Sep; 10: (9): 1-18. |
[61] | Staver AC, Levin SA. Integrating theoretical climate and fire effects on savanna and forest systems. The American Naturalist. (2012) ; 180: (2): 211-224. |
[62] | Bugmann H, Palah M, Bontemps J-D, Tom M. Trends in modeling to address forest management and environmental challenges in Europe. Forest Systems. (2010) Dec; Special Issue: 3-7. |
[63] | Lahiri K, Yang C. Boosting tax revenues with mixed-frequency data in the aftermath of COVID-19: The case of New York. International Journal of Forecasting [Internet]. (2021) ; Available from: https://www.sciencedirect.com/science/article/pii/S0169207021001680. |
[64] | Strohsal T, Wolf E. Data revisions to German national accounts: Are initial releases good nowcasts? International Journal of Forecasting. (2020) ; 36: (4): 1252-1259. |
[65] | Chapman JTE, Desai A. Using payments data to nowcast macroeconomic variables during the onset of COVID-19 [Internet]. Ottawa: Bank of Canada; (2021) Jan [cited 2021 May 5]. (Staff Working Paper). Report No.: 2021-2022. Available from: https://www.bankofcanada.ca/wp-content/uploads/2021/01/swp2021-2.pdf. |
[66] | Jafari Y, Britz W. Modelling heterogeneous firms and non-tariff measures in free trade agreements using Computable General Equilibrium. Economic Modelling. (2018) ; 73: : 279-294. |
[67] | Himics M, Listorti G, Tonini A. Simulated economic impacts in applied trade modelling: A comparison of tariff aggregation approaches. Economic Modelling. (2020) ; 87: : 344-357. |
[68] | Bekkers E, Antimiani A, Carrico C, Flaig D, Fontagne L, Four J, et al. Modelling trade and other economic interactions between countries in baseline projections. Journal of Global Economic Analysis. (2020) Jun; 5: : 273-345. |
[69] | UNSD. SDG Indicators: Metadata repository, Target 9.4 [Internet]. SDG Indicators: Metadata repository, Target 9.4. (2021) [cited 2021 Oct 1]. Available from: https://unstats.un.org/sdgs/metadata/?Text&Goal=9&Target=9.4. |
[70] | IEA. Greenhouse Gas Emissions from Energy: Overview [Internet]. Greenhouse Gas Emissions from Energy: Overview. (2021) [cited 2021 Oct 1]. Available from: https://www.iea.org/reports/greenhouse-gas-emissions-from-energy-overview. |
[71] | Statista. Emissions [Internet]. Emissions. (2021) [cited 2021 Sep 21]. Available from: https://www.statista.com/markets/408/topic/949/emissions/. |
[72] | EIA. Monthly energy review [Internet]. Monthly Energy Review. (2021) [cited 2021 Sep 27]. Available from: https://www.eia.gov/totalenergy/data/monthly/index.php. |
[73] | OECD. OECD.Stat [Internet]. OECD.Stat. (2021) [cited 2021 Sep 25]. Available from: https://stats.oecd.org/index.aspx. |
[74] | Eurostat. Eurostat Database [Internet]. Eurostat Database. (2021) [cited 2021 Sep 25]. Available from: https://ec.europa.eu/eurostat/web/main/data/database. |
[75] | USCB. The X-13ARIMA-SEATS seasonal adjustment program [Internet]. The X-13ARIMA-SEATS Seasonal Adjustment Program. (2017) [cited 2021 Mar 1]. Available from: https://www.census.gov/srd/www/x13as/. |
[76] | Guichard S, Rusticelli E. A dynamic factor model for world trade growth. (2011) ; (874). Available from: https://www.oecd-ilibrary.org/content/paper/5kg9zbvvwqq2-en. |
[77] | Antolin-Diaz J, Drechsel T, Petrella I. Advances in nowcasting economic activity: Secular trends, large shocks and new data. (2020) ; Available from: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3669854. |
[78] | Kuzin VN, Marcellino M, Schumacher C. MIDAS versus mixed-frequency VAR: Nowcasting GDP in the euro area [Internet]. Deutsche Bundesbank; (2009) . Report No.: 2009, 07. Available from: https://ideas.repec.org/p/zbw/bubdp1/7576.html. |
[79] | Cimadomo J, Giannone D, Lenza M, Sokol A, Monti F. Nowcasting with large Bayesian vector autoregressions [Internet]. European Central Bank; (2020) Aug. Report No.: 2453. Available from: https://ideas.repec.org/p/ecb/ecbwps/20202453.html. |
[80] | Hopp D. Economic nowcasting with long short-term memory artificial neural networks (LSTM) [Internet]. UNCTAD; (2021) . (UNCTAD Research Paper). Report No.: 62. Available from: https://unctad.org/system/files/official-document/ser-rp-2021d5_en.pdf. |
[81] | Hopp D. nowcasting_benchmark [Internet]. (2022) [cited 2022 Feb 2]. Available from: https://github.com/dhopp1/nowcasting_benchmark. |
[82] | Hopp D. nowcast_lstm [Internet]. nowcast_lstm. (2021) . Available from: https://github.com/dhopp1/nowcast_lstm. |
[83] | Briggs H. Global CO2 emissions stalled in 2014 [Internet]. Global CO2 emissions stalled in 2014. (2015) [cited 2021 Nov 25]. Available from: https://www.bbc.com/news/science-environment-31872460. |
[84] | Scientific American. Carbon intensity of global economy fell in 2014 [Internet]. Carbon Intensity of Global Economy Fell in 2014. (2015) [cited 2021 Nov 25]. Available from: https://www.scientificamerican.com/article/carbon-intensity-of-global-economy-fell-in-20141/. |
Appendices
Appendix