Semantic Abstraction for generalization of&nbsp;tweet classification: An evaluation of&nbsp;incident-related tweets

Schulz, Axel; Guckelsberger, Christian; Janssen, Frederik

doi:10.3233/SW-150188

Semantic Abstraction for generalization of tweet classification: An evaluation of incident-related tweets

Article type: Research Article

Authors: Schulz, Axel^{a; *} | Guckelsberger, Christian^b | Janssen, Frederik^c

Affiliations: [a] Technische Universität Darmstadt, Telecooperation Lab, Germany. E-mail: [email protected] | [b] Goldsmiths, University of London, Computational Creativity Group, United Kingdom. E-mail: [email protected] | [c] Technische Universität Darmstadt, Knowledge Engineering Group, Germany. E-mail: [email protected]

Correspondence: [*] Corresponding author. E-mail: [email protected].

Abstract: Social media is a rich source of up-to-date information about events such as incidents. The sheer amount of available information makes machine learning approaches a necessity to process this information further. This learning problem is often concerned with regionally restricted datasets such as data from only one city. Because social media data such as tweets varies considerably across different cities, the training of efficient models requires labeling data from each city of interest, which is costly and time consuming. To avoid such an expensive labeling procedure, a generalizable model can be trained on data from one city and then applied to data from different cities. In this paper, we present Semantic Abstraction to improve the generalization of tweet classification. In particular, we derive features from Linked Open Data and include location and temporal mentions. A comprehensive evaluation on twenty datasets from ten different cities shows that Semantic Abstraction is indeed a valuable means for improving generalization. We show that this not only holds for a two-class problem where incident-related tweets are separated from non-related ones but also for a four-class problem where three different incident types and a neutral class are distinguished. To get a thorough understanding of the generalization problem itself, we closely examined rule-based models from our evaluation. We conclude that on the one hand, the quality of the model strongly depends on the class distribution. On the other hand, the rules learned on cities with an equal class distribution are in most cases much more intuitive than those induced from skewed distributions. We also found that most of the learned rules rely on the novel semantically abstracted features.

Keywords: Tweets, classification, Linked Open Data, Semantic Abstraction, incident detection

DOI: 10.3233/SW-150188

Journal: Semantic Web, vol. 8, no. 3, pp. 353-372, 2017

Published: 6 December 2016

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia