Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Zhou, Xiaotanga; b | Ouyang, Jihonga; b; * | Li, Ximinga; b
Affiliations: [a] College of Computer Science and Technology, Jilin University, Changchun 130012, Jilin, China | [b] Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, Jilin, China
Correspondence: [*] Corresponding author: Jihong Ouyang, College of Computer Science and Technology, Jilin University, Changchun 130012, Jilin, China. Tel.: +86 138 4318 4836; E-mail: [email protected].
Abstract: As an efficient sampling algorithm for latent dirichlet allocation SparseLDA uses cache strategy to improve the time and space efficiency of its standard gibbs sampling algorithm (StdGibbs) by recycling previous computation. However, SparseLDA cannot further improve the time-efficiency of StdGibbs, since the amount of recycled computation is limited. This is because the word types of two adjacent tokens are usually different and the previous computation cannot be further recycled easily. To solve this problem, in this paper we propose a new algorithm named Efficient SparseLDA (ESparseLDA) based on SparseLDA. The main idea of ESparseLDA is to first rearrange the tokens within one text according to the word types so that the tokens of the same word type are aggregated together and then recycle more computation while making no approximation and ensuring the exactness. In this paper, we make detailed theoretical explanations and comparative experimental analyses on the correctness, exactness and time-efficiency of ESparseLDA. In detail, the statistical significance tests on perplexities strictly show that ESparseLDA is correct and exact. In addition, the running time results show that the time-efficiency of ESparseLDA is the higher than SparseLDA in varying degrees from 5.06% to 31.85% on the different datasets used in experiments.
Keywords: Latent dirichlet allocation, topic model, gibbs sampling, topic inference
DOI: 10.3233/IDA-173609
Journal: Intelligent Data Analysis, vol. 22, no. 6, pp. 1227-1257, 2018
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]