Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Soonthornphisaj, Nuanwana; b; * | Kijsirikul, Boonsermc
Affiliations: [a] Machine Intelligence and Knowledge Discovery Laboratory, Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand, 10330. Tel.: +661 6592877; Fax: +662 2186955; E-mail: [email protected] | [b] Home address: 100/693 Chollada 48A, Bangkruay-Sainoi Rd. Bangbuathong, Nonthaburi, Thailand, 11110 | [c] Machine Intelligence and Knowledge Discovery Laboratory, Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand, 10330. Tel.: +662 2186956; Fax: +662 2186955; E-mail: [email protected]
Correspondence: [*] Corresponding author
Abstract: The goal of Web page categorization is to classify Web documents into a certain number of predefined categories. Previous works in this area employed a large number of labeled training documents for supervised learning. The problem is that, it is difficult to create labeled training documents. Though it is not so easy to manually categorize unlabeled documents for creating training data, it is easy to collect unlabeled ones. Therefore, a new machine learning algorithm is investigated to overcome these difficulties and effectively utilize unlabeled documents. We propose a novel approach called Iterative Cross-Training (ICT). In this paper, we applied the algorithm to Web page categorization on three data sets. The performance of ICT was evaluated and analyzed with the supervised learning algorithms, Co-Training and Expectation Maximization. We found that ICT is considered to be an effective approach for the Web page categorization task.
Keywords: machine learning, web mining, web page categorization
DOI: 10.3233/IDA-2003-7305
Journal: Intelligent Data Analysis, vol. 7, no. 3, pp. 233-253, 2003
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]