Missing data imputation using correlation coefficient and min-max normalization weighting

Shantal, Mohammed; Othman, Zalinda; Abu Bakar, Azuraliza

doi:10.3233/IDA-230140

Missing data imputation using correlation coefficient and min-max normalization weighting

Article type: Research Article

Authors: Shantal, Mohammed^{a; *} | Othman, Zalinda^b | Abu Bakar, Azuraliza^b

Affiliations: [a] Computer Science Department, Sebha University, Sebha, Libya | [b] The Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Malaysia

Correspondence: [*] Corresponding author: Mohammed Shantal, Computer Science Department, Sebha University, Sebha, Libya. E-mail: [email protected].

Abstract: Missing data is one of the challenges a researcher encounters while attempting to draw information from data. The first step in solving this issue is to have the data stage ready for processing. Much effort has been made in this area; removing instances with missing data is a popular method for handling missing data, but it has drawbacks, including bias. It will be impacted negatively on the results. How missing values are handled depends on several vectors, including data types, missing rates, and missing mechanisms. It covers missing data patterns as well as missing at random, missing at completely random, and missing not at random. Other suggestions include using numerous imputation techniques divided into various categories, such as statistical and machine learning methods. One strategy to improve a model’s output is to weight the feature values to better the performance of classification or regression approaches. This research developed a new imputation technique called correlation coefficient min-max weighted imputation (CCMMWI). It combines the correlation coefficient and min-max normalization techniques to balance the feature values. The proposed technique seeks to increase the contribution of features by considering how those elements relate to the desired functionality. We evaluated several established techniques to assess the findings, including statistical techniques, mean and EM imputation, and machine learning imputation techniques, including k-NNI, and MICE. The evaluation also used the imputation techniques CBRL, CBRC, and ExtraImpute. We use various sizes of datasets, missing rates, and random patterns. To compare the imputed datasets and original data, we finally provide the findings and assess them using the root mean squared error (RMSE), mean absolute error (MAE), and R2. According to the findings, the proposed CCMMWI performs better than most other solutions in practically all missing-rate scenarios.

Keywords: Missing data, imputation method, feature weighting, correlation coefficient, data standardization, Min-Max normalization, classification method, regression method

DOI: 10.3233/IDA-230140

Journal: Intelligent Data Analysis, vol. Pre-press, no. Pre-press, pp. 1-15, 2024

Published: 21 July 2024

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia