Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Rossignol, Mathias | Sébillot, Pascale; *
Affiliations: Irisa, Campus de Beaulieu, 35042 Rennes Cedex, France
Correspondence: [*] Corresponding author: Pascale Sébillot. Tel.: +33 2 99 84 73 17; Fax: +33 2 99 84 71 71; E-mail: [email protected].
Abstract: We present an unsupervised method for the generation from a textual corpus of sets of keywords, that is, words whose occurrences in a text are strongly connected with the presence of a given topic. Each of these classes is associated with one of the main topics of the corpus, and can be used to detect the presence of that topic in any of its paragraphs, by a simple keyword co-occurrence criterion. The classes are extracted from the textual data in a fully automatic way, without requiring any a priori linguistic knowledge or making any assumptions about the topics to search for. The algorithms we have developed allow us to yield satisfactory and directly usable results despite the amount of noise inherent in textual data. That goal is reached thanks to a combination of several data analysis techniques. On a corpus of archives from the French monthly newspaper Le Monde Diplomatique, we obtain 40 classes of about 30 words each that accurately characterize precise topics, and allow us to detect their occurrences with a precision and recall of 85% and 65% respectively.
Keywords: topic detection, topic characterization, statistical data analysis, unsupervised classification
DOI: 10.3233/IDA-2005-9107
Journal: Intelligent Data Analysis, vol. 9, no. 1, pp. 105-127, 2005
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]