Metadata for Big Data: A preliminary investigation of metadata quality issues in research data repositories


Data-driven approaches to scientific research have generated new types of repositories that provide scientists the means necessary to store, share and re-use big data-sets generated at various stages of the research process. As the number and heterogeneity of research data repositories increase, it becomes critical for scientists to solve data quality problems associated to the data-sets stored in these repositories. To date, several authors have been focused on the data quality issues associated to the data-sets stored in the repositories, yet there is little knowledge about the quality problems of the metadata used to describe these data-sets. Metadata is important for the long-term sustainability of research data repositories and data re-use. The aim of the research reported in this paper was to identify the data quality problems associated with the metadata used in the Dryad data repository. The paper concludes with some recommendations for improving the quality of metadata in research data repositories.