Clustering web pages about persons and organizations

Ye, Shiren; Chua, Tat-Seng; Kei, Jeremy R.

Clustering web pages about persons and organizations

Article type: Research Article

Authors: Ye, Shiren | Chua, Tat-Seng | Kei, Jeremy R.

Affiliations: School of Computing, National University of Singapore, Singapore, 117543. E-mail: {yesr,chuats,jkei}@comp.nus.edu.sg

Note: [] Corresponding author

Abstract: One of the most frequent Web surfing tasks is to search for persons and organizations by their names. Such names are often not distinctive, commonly occurring, and non-unique. Thus, a single name may be mapped to several named target entities. This paper describes a new methodology to cluster web pages returned by a search engine so that pages belonging to different entities are clustered into different groups. The algorithm uses a combination of named entities, and link-based and structure-based information as features to partition the document set into direct and indirect pages by means of a decision-tree model. It then chooses the appropriate distinctive direct pages as seeds to cluster the document set into different clusters. The algorithm has been found to be effective for web-based information retrieval applications.

Keywords: Web clustering, persons and organizations, machine learning, text classification, information retrieval, named entity

Journal: Web Intelligence and Agent Systems: An international journal, vol. 3, no. 4, pp. 203-216, 2005

Received 21 December 2005

Accepted 21 December 2005

Published: 2005

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia