Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Ihlayyel, Hani A.K.* | Sharef, Nurfadhlina Mohd | Nazri, Mohd Zakree Ahmed | bakar, Azuraliza Abu
Affiliations: Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor Darul Ehsan, Malaysia
Correspondence: [*] Corresponding author: Hani A.K. Ihlayyel, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor Darul Ehsan, Malaysia. Tel.: +60 132439300; E-mail: [email protected].
Abstract: Stock price prediction has been an attractive research domain for both investors and computer scientists for more than a decade. Reaction prediction to the stock market, especially based on released financial news articles and published stock prices, still poses a great challenge to researchers because the prediction accuracy is relatively low. For prediction purposes, linear regression is a popular method. Statistical metrics, such as the Document Frequency (DF), term frequency-invert document frequency (TF-IDF) and information gain (IG), are used for feature selection to extract the most expressive features to reduce the high dimensionality of the data. However, the effectivenesses of the available metrics have not been explored in identifying important financial feature representations that have dependable and strong relations with the stock price. The objective of this study are (i) to investigate the performance of five statistical metrics, namely, DF, TF-IDF, IG, Chi-square Statistics (Chi-Sqr) and occurrence in identifying important features that can represent the news and have a strong relationship with the stock price; (ii) to introduce feedback variables, namely, the prediction accuracy (PA), directional accuracy (DA) and closeness accuracy (CA), to capture the interaction between the released news and the published stock prices; and (iii) to introduce a prediction model that integrates features from financial news and a stock price value series based on a 20-minute time lag using linear regression. The experiment used the ELR-BoW method to build a number of 330 datasets with five statistical metrics to select different feature sizes of 50, 100, 150, 200, 250, 300, 400, 500, 600, 700 and 800. The performance of ELR-BoW is observed based on three parameters, namely, PA, DA and CA, and is compared against Naïve Bayes (NB) as the benchmark approach and the Support Vector Machine (SVM). The proposed ELR-BoW-SVM obtained a higher accuracy compared to ELR-BoW-NB, where the best feedback measure is PA, which has an F-measure value of 0.842. In addition, the best number of features is 300 features and using document frequency DF statistical metric. The identification of the top feature representations for financial news is highly promising for automatic news processing for stock prediction. This study demonstrates that the identification of the top feature representations for financial news is highly promising for news article processing in stock prediction.
Keywords: Financial news, linear regression, stock market prediction, statistical metric and feature representation
DOI: 10.3233/IDA-163316
Journal: Intelligent Data Analysis, vol. 22, no. 1, pp. 45-76, 2018
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]