Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Vaičiūnas, Airenas | Raškinis, Gailius
Affiliations: Department of Applied Informatics, Vytautas Magnus University, Vileikos 8, LT-44404 Kaunas, Lithuania, e-mail: [email protected], [email protected]
Abstract: This paper investigates a variety of statistical cache-based language models built upon three corpora: English, Lithuanian, and Lithuanian base forms. The impact of the cache size, type of the decay function, including custom corpus derived functions, and interpolation technique (static vs. dynamic) on the perplexity of a language model is studied. The best results are achieved by models consisting of 3 components: standard 3-gram, decaying cache 1-gram and decaying cache 2-gram that are joined together by means of linear interpolation using the technique of dynamic weight update. Such a model led up to 36% and 43% perplexity improvement with respect to the 3-gram baseline for Lithuanian words and Lithuanian word base forms respectively. The best language model of English led up to a 16% perplexity improvement. This suggests that cache-based modeling is of greater utility for the free word order highly inflected languages.
Keywords: language models, n-grams, cache models, dynamic interpolation, perplexity reduction, inflected language, free word order language, Lithuanian
Journal: Informatica, vol. 17, no. 1, pp. 111-124, 2006
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]