Principal component-based feature selection for tumor classification
One of the important problems in microarray gene expression data is tumor classification. This paper proposes a new feature selection method for tumor classification using gene expression data. In this method, three dimensionality reduction methods, including principal component analysis (PCA), factor analysis (FA) and independent component analysis (ICA), are first introduced to extract and select features for tumor classification, and their corresponding specific steps are given respectively. Then, the superiority of three algorithms is demonstrated by performing experimental comparisons on acute leukemia data sets. It is concluded that PCA compared with FA and ICA is the best under feature load ratio. However, PCA cannot make full use of the category information. To overcome the weak point, Fisher linear discriminant (FLD) is employed as those components of PCA, and a new approach to principal component discriminant analysis (PCDA) is proposed to retain all assets and work better than both PCA and FLD for classification. The further experimental results show that the classification ability of selected feature subsets by means of PCDA is higher than that of the other related dimensionality reduction methods, and the proposed algorithm is efficient and feasible for tumor classification.