Abstract: In recent years, Eurostat promoted Experimental Statistics, i.e., statistics that “have not reached full maturity in terms of harmonization, coverage or methodology”. These statistics are based on new data sources, mainly Big Data, able to improve timeliness and provide new complementary measures to Official Statistics estimates. However, the use of Big Data implies new quality challenges, and a well-defined Big Data quality framework is not available. The Social Mood on Economy Index (SMEI) is among the first experimental statistics on Big Data published by the Italian National Statistics Institute (Istat) since 2018. SMEI is a daily index computed from the Italian Twitter’s public stream aimed at representing the evolution of the feelings on economics topics. The longevity of SMEI makes it a perfect candidate to investigate Big Data related quality issues. Its intrinsic multivariate approach hinders the interpretation of the index. Are we able to track its quality characteristics? What are specific uses of the SME index? The paper will report current discussion and solutions implemented at Istat, in particular it focuses on SMEI’s revision due to COVID-19 pandemic. The present work is aiming at contributing to the ESS debate on setting up quality standards for processing Big Data-based statistical products.
Keywords: Big data, experimental statistics, Twitter, natural language processing