Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Issue title: Special Supplement Issue in Section A and B: Selected Papers from the ISCA International Conference on Software Engineering and Data Engineering, 2009
Article type: Research Article
Authors: Toshniwal, Durga; * | Roy, Rishiraj Saha
Affiliations: Department of Electronics and Computer Engineering, Indian Institute of Technology Roorkee, Roorkee – 247 667, Uttarakhand, India
Correspondence: [*] Corresponding author. E-mail: [email protected].
Abstract: Research in text mining has recently gained a lot of importance due to the large increase in the number of electronic news articles, books, research papers, and e-mail messages. Clustering organizes text documents in an unsupervised fashion. In this paper, we propose an algorithm for clustering unstructured text documents using shape pattern matching. The Vector Space Model is used to represent our dataset as a term-weight matrix. The high dimensional vector space has been mapped to a two-dimensional plane that has the term weights plotted against a time axis. In this way, the text documents are represented in the form of time sequences. Initially, the documents are broadly grouped into categories that are determined using domain knowledge. The relevant portion of the document vector is then clipped out. The shape patterns present in these clipped portions are observed. Indexing of these shape patterns is done by preparing their alphabet. Grouping documents within a category which share the same shape pattern results in the required clusters.
Keywords: Text document clustering, vector space model, term frequency (TF), inverse document frequency (IDF), TF-IDF measure, shape patterns
DOI: 10.3233/JCM-2010-0269
Journal: Journal of Computational Methods in Sciences and Engineering, vol. 10, no. s1, pp. S73-S84, 2010
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]