Mining Outliers in Correlated Subspaces for High Dimensional Data Sets

Leng, Jinsong; Hong, Tzung-Pei

doi:10.3233/FI-2010-217

Mining Outliers in Correlated Subspaces for High Dimensional Data Sets

Issue title: Intelligent Data Analysis in Granular Computing

Article type: Research Article

Authors: Leng, Jinsong | Hong, Tzung-Pei

Affiliations: School of Computer and Security Science, Edith Cowan University WA 6050, Australia. E-mail: [email protected] | Department of Computer Science and Information Engineering National University of Kaohsiung, Taiwan. E-mail: [email protected]

Abstract: Outlier detection in high dimensional data sets is a challenging data mining task. Mining outliers in subspaces seems to be a promising solution, because outliers may be embedded in some interesting subspaces. Searching for all possible subspaces can lead to the problem called "the curse of dimensionality". Due to the existence of many irrelevant dimensions in high dimensional data sets, it is of paramount importance to eliminate the irrelevant or unimportant dimensions and identify interesting subspaces with strong correlation. Normally, the correlation among dimensions can be determined by traditional feature selection techniques or subspace-based clustering methods. The dimension-growth subspace clustering techniques can find interesting subspaces in relatively lower dimension spaces, while dimension-reduction approaches try to group interesting subspaces with larger dimensions. This paper aims to investigate the possibility of detecting outliers in correlated subspaces. We present a novel approach by identifying outliers in the correlated subspaces. The degree of correlation among dimensions is measured in terms of the mean squared residue. In doing so, we employ a dimension-reduction method to find the correlated subspaces. Based on the correlated subspaces obtained, we introduce another criterion called "shape factor" to rank most important subspaces in the projected subspaces. Finally, outliers are distinguished from most important subspaces by using classical outlier detection techniques. Empirical studies show that the proposed approach can identify outliers effectively in high dimensional data sets.

Keywords: Outlier Detection, Subspace Outlier Detection, Subspace Clustering, Shape Factor, Dimension Reduction

DOI: 10.3233/FI-2010-217

Journal: Fundamenta Informaticae, vol. 98, no. 1, pp. 71-86, 2010

Received 8 March 2010

Accepted 8 March 2010

Published: 2010

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia