Affiliations: Shibaura Institute of Technology, 307 Fukasaku,
Minuma-ku, Saitama-City 337-8570, Japan | Japan Advanced Institute of Science and Technology,
1-1, Asahidai, Tatsunokuchi, Ishikawa 923-1292, Japan | The Institute of Scientific and Industrial Reasearch,
Osaka University, Mihogaoka, Ibaraki, Osaka 567-0047, Japan
Abstract: Due to the recent explosive increase of Web-pages on World Wide Web,
it is now urgently required for portal sites like Yahoo! service having
directory-style search engines to classify Web-pages into many categories
automatically. This paper investigates how rough set theory can help select
relevant features for Web-page classification. Our experimental results show
that the combination of the rough set-aided feature selection method and the
Support Vector Machine with the linear kernel is quite useful in practice to
classify Web-pages into multiple categories because not only our experiments
give acceptable accuracy but also the high dimensionality reduction is achieved
without the need to search for a threshold for feature selection.