Latent semantic analysis for tagging activation states and identifiability in northwestern Mexican news outlets

Sánchez-Fernández, Manuel-Alejandro; Medina-Urrea, Alfonso; Torres-Moreno, Juan-Manuel

doi:10.3233/JIFS-219235

Latent semantic analysis for tagging activation states and identifiability in northwestern Mexican news outlets

Issue title: Recent Advances in Language & Knowledge Engineering

Guest editors: David Pinto, Beatriz Beltrán and Vivek Singh

Article type: Research Article

Authors: Sánchez-Fernández, Manuel-Alejandro^{a; *} | Medina-Urrea, Alfonso^b | Torres-Moreno, Juan-Manuel^c

Affiliations: [a] Instituto de Humanidades y Ciencias de la Conducta, México | [b] Centro de Estudios Lingüísticos y Literarios, El Colegio de México, México | [c] Laboratoire Informatique d’Avignon, Université d’Avignon, France

Correspondence: [*] Corresponding author. Manuel-Alejandro Sánchez-Fernández, Instituto de Humanidades y Ciencias de la Conducta, México. E-mail: [email protected].

Abstract: The present work aims to study the relationship between measures, obtained from Latent Semantic Analysis (LSA) and a variant known as SPAN, and activation and identifiability states (Informative States) of referents in noun phrases present in journalistic notes from Northwestern Mexican news outlets written in Spanish. The aim and challenge is to find a strategy to achieve labelling of new / given information in the discourse rooted in a theoretically linguistic stance. The new / given distinction can be defined from different perspectives in which it varies what linguistic forms are taken into account. Thus, the focus in this work is to work with full referential devices (n = 2 388). Pearson’s R correlation tests, analysis of variance, graphical exploration of the clustering of labels, and a classification experiment with random forests are performed. For the experiment, two groups were used: noun phrases labeled with all 10 tags of informative states and a binary labelling, as well as the use of two bags-of-words for each noun phrase: the interior and the exterior. It was found that using LSA in conjunction with the inner bag of words can be used to classify certain informational states. This same measure showed good results for the binary division, detecting which sentences introduce new referents in discourse. In previous work using a similar method in noun phrases in English, 80% accuracy (n = 478) was reached in their classification exercise. Our best test for Spanish reached 79%. No work on Spanish using this method has been done before and this kind of experiment is important because Spanish exhibits a more complex inflectional morphology.

Keywords: Automatic tagging, activation states, latent semantic analysis, noun phrases, computational pragmatics

DOI: 10.3233/JIFS-219235

Journal: Journal of Intelligent & Fuzzy Systems, vol. 42, no. 5, pp. 4463-4471, 2022

Published: 31 March 2022

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia