Affiliations: Development Data Group, World Bank, Washington, DC, USA
Correspondence:
[*]
Corresponding author: Daniel Gerszon Mahler, Development Data Group, World Bank, 1818 H Street, NW Washington, DC 20433, USA. E-mail: [email protected].
Abstract: To monitor progress towards global goals such as the Sustainable Development Goals, global statistics are needed. Yet cross-country datasets are rarely truly global, creating a trade-off for producers of global statistics: the lower the data coverage threshold for disseminating global statistics, the more can be made available, but the lower accuracy they will have. We quantify this availability-accuracy trade-off by running more than 10 million simulations on the World Development Indicators. We show that if the fraction of the world’s population on which one lacks data is x, then one should expect to be 0.37 *x standard deviations off the true global value, and risk being as much as x standard deviations off. We show the robustness of this result to various assumptions and give recommendations on when there is enough data to create global statistics. Though the decision will be context specific, in a baseline scenario we suggest not to create global statistics when there is data for less than half of the world’s population.