Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Raja Sree, S.a; * | Kunthavai, A.b
Affiliations: [a] Department of Information Technology, Coimbatore Institute of Technology, Coimbatore, India | [b] Department of Computer Science and Engineering, Coimbatore Institute of Technology, Coimbatore, India
Correspondence: [*] Corresponding author: S. Raja Sree, Department of Information Technology, Coimbatore Institute of Technology, Coimbatore, Tamilnadu 641004, India. E-mail: [email protected].
Abstract: BACKGROUND: Breast cancer is a major disease causing panic among women worldwide. Since gene mutations are the root cause for cancer development, analyzing gene expressions can give more insights into various phenotype of cancer treatments. Breast Cancer subtype prediction from gene expression data can provide more information for cancer treatment decisions. OBJECTIVE: Gene expressions are complex for analysis due to its high dimensional nature. Machine learning algorithms such as k-Nearest Neighbors, Support Vector Machine (SVM) and Random Forest are used with selection of features for prediction of breast cancer subtypes. Prediction accuracy of the existing methods are affected due to high dimensional nature of gene expressions. The objective of the work is to propose an efficient algorithm for the prediction of breast cancer subtypes from gene expression. METHODS: For subtype prediction, a novel Hubness Weighted Support Vector machine algorithm (HWSVM) using bad hubness score as a weight measure to handle the outliers in the data has been proposed. Based on the various subtypes, features are projected into seven different feature sets and Ensemble based Hubness Aware Weighted Support Vector Machine (HWSVMEns) is implemented for breast cancer subtype prediction. RESULTS:The proposed algorithms have been compared with the classical SVM and other traditional algorithms such as Random Forest, k-Nearest Neighbor algorithms and also with various gene selection methods. CONCLUSIONS:Experimental results show that the proposed HWSVM outperforms other algorithms in terms of accuracy, precision, recall and F1 score due to the hubness weightage scheme and the ensemble approach. The experiments have shown an average accuracy of 92% across various gene expression datasets.
Keywords: Breast cancer subtypes, high-dimensional data, hubness, gene selection, support vector machine
DOI: 10.3233/THC-212825
Journal: Technology and Health Care, vol. 30, no. 3, pp. 565-578, 2022
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]