Empirical methodology for crowdsourcing ground truth

Dumitrache, Anca; Inel, Oana; Timmermans, Benjamin; Ortiz, Carlos; Sips, Robert-Jan; Aroyo, Lora; Welty, Chris

doi:10.3233/SW-200415

Empirical methodology for crowdsourcing ground truth

Issue title: Open Science Data and the Semantic Web Journal

Article type: Research Article

Authors: Dumitrache, Anca^{a; e; *} | Inel, Oana^{a; f} | Timmermans, Benjamin^{a; b} | Ortiz, Carlos^c | Sips, Robert-Jan^{b; g} | Aroyo, Lora^{a; d} | Welty, Chris^{a; d}

Affiliations: [a] Vrije Universiteit Amsterdam, De Boelelaan 1081, Amsterdam, Netherlands. E-mails: [email protected], [email protected], [email protected], [email protected] | [b] IBM Center for Advanced Studies Benelux, Johan Huizingalaan 765, Amsterdam, Netherlands. E-mails: [email protected], [email protected] | [c] Netherlands eScience Center, Amsterdam, Netherlands. E-mail: [email protected] | [d] Google, New York, USA | [e] FD Mediagroep, Amsterdam, Netherlands | [f] TU Delft, Delft, Netherlands | [g] myTomorrows, Amsterdam, Netherlands

Correspondence: [*] Corresponding author. E-mail: [email protected].

Abstract: The process of gathering ground truth data through human annotation is a major bottleneck in the use of information extraction methods for populating the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the attempt to solve the issues related to volume of data and lack of annotators. Typically these practices use inter-annotator agreement as a measure of quality. However, in many domains, such as event detection, there is ambiguity in the data, as well as a multitude of perspectives of the information examples. We present an empirically derived methodology for efficiently gathering of ground truth data in a diverse set of use cases covering a variety of domains and annotation tasks. Central to our approach is the use of CrowdTruth metrics that capture inter-annotator disagreement. We show that measuring disagreement is essential for acquiring a high quality ground truth. We achieve this by comparing the quality of the data aggregated with CrowdTruth metrics with majority vote, over a set of diverse crowdsourcing tasks: Medical Relation Extraction, Twitter Event Identification, News Event Extraction and Sound Interpretation. We also show that an increased number of crowd workers leads to growth and stabilization in the quality of annotations, going against the usual practice of employing a small number of annotators.

Keywords: CrowdTruth, ground truth gathering, annotator disagreement, semantic interpretation, medical, event extraction, relation extraction

DOI: 10.3233/SW-200415

Journal: Semantic Web, vol. 12, no. 3, pp. 403-421, 2021

Published: 9 March 2021

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia