Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Lee, Jae-Moona; b | Calvo, Rafael A.b
Affiliations: [a] School of Information and Computer Engineering, Hansung University, Korea | [b] Web Engineering Group, School of Electrical and Information Engineering, University of Sydney, Australia. E-mail: [email protected]
Abstract: This paper describes the design and implementation of new naive Bayes and k-Nearest Neighbour methods that are highly scalable and efficient for document classification. Three methods for improving scalability are analysed: a change in the data representation and therefore in the algorithms' implementation, a partitioning mechanism that breaks down the problem into smaller parts, and a buffering mechanism to improve memory efficiency for large datasets. The classifiers were tested over two Reuters datasets: ModApte a popular but small benchmark, and RCV1 a new large collection of news stories, and compared to more standard implementations of these methods, both experimentally and analitically.
Keywords: language technologies, news stories, document classification, naive Bayes, k-Nearest Neighbour
DOI: 10.3233/IDA-2005-9404
Journal: Intelligent Data Analysis, vol. 9, no. 4, pp. 365-380, 2005
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]