Affiliations: [a] School of Management & Enterprise, The University of Southern Queensland, QLD, Australia. E-mails: [email protected], [email protected] | [b] Science and Engineering Faculty, Queensland University of Technology, QLD, Australia. E-mail: [email protected] | [c] Department of Computer Science and Engineering, SRM Institute of Science and Technology, India. E-mail: [email protected] | [d] Faculty of Health, Engineering and Sciences, The University of Southern Queensland, QLD, Australia. E-mail: [email protected] | [e] Rural Clinical School, The University of Queensland, QLD, Australia. E-mail: [email protected]
Abstract: Text classification (a.k.a text categorisation) is an effective and efficient technology for information organisation and management. With the explosion of information resources on the Web and corporate intranets continues to increase, it has being become more and more important and has attracted wide attention from many different research fields. In the literature, many feature selection methods and classification algorithms have been proposed. It also has important applications in the real world. However, the dramatic increase in the availability of massive text data from various sources is creating a number of issues and challenges for text classification such as scalability issues. The purpose of this report is to give an overview of existing text classification technologies for building more reliable text classification applications, to propose a research direction for addressing the challenging problems in text mining.
Keywords: Text classification, text mining, feature selection, machine learning