Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Abraham, Ashaa; * | Kayalvizhi, R.a | Mohideen, Habeeb Shaikb
Affiliations: [a] Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology Kattankulathur, Chennai, India | [b] Department of Genetic Engineering, College of Engineering and Technology, SRM Institute of Science and Technology Kattankulathur, Chennai, India
Correspondence: [*] Corresponding author. Asha Abraham, Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology Kattankulathur, Chennai, 603203, India. E-mail: [email protected].
Abstract: Nowadays, cancer has become more alarming. This paper discusses the most significant Ovarian Cancer, Epithelial Ovarian Cancer (EOC), due to the low survival rate. The proposed algorithm for this work is a ‘Multi classifier ShapRFECV based EOC’ (MSRFECV-EOC) subtype analysis technique that utilized the EOC data from the National Centre for Biotechnology Information and Cancer Cell Line Encyclopedia websites for early identification of EOC using Machine Learning Techniques. This approach increases the data size, balances different classes of the data, and cuts down the enormous number of features unrelated to the disease of interest to prevent overfitting. To incorporate these functionalities, in the data preprocessing stage, OC-related gene names were taken from the Cancermine database and other OC-related works. Moreover, OC datasets were merged based on OC genes, and missing values of EOC subtypes were identified and imputed using Iterative Logistic Imputation. Synthetic Minority Oversampling Technique with an Edited Nearest Neighbors approach is applied to the imputed dataset. Next, in the Feature Selection phase, the most significant features for subtypes of EOC were identified by applying the Shapley Additive Explanations based on the Recursive Feature Elimination Cross-Validation (ShapRFECV) algorithm, preserving predefined features while selecting new EOC features. Eventually, an accuracy of 97% was achieved with Optuna-optimized Random Forest, which outperformed the existing models. SHAP plotted the most prominent features behind the classification. The Pickle tool saves much training time by preserving hidden parameter values of the model. In the final phase, by using the Stratified K Fold Stacking Classifier, the accuracy was improved to 98.9%.
Keywords: Machine learning, Ovarian cancer, Pickle, multi classification, Random Forest
DOI: 10.3233/JIFS-236197
Journal: Journal of Intelligent & Fuzzy Systems, vol. 46, no. 4, pp. 9103-9117, 2024
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]