Affiliations: School of Electrical Engineering and Computer Science, Faculty of Science and Engineering, Queensland University of Technology, Brisbane Qld 4001, Australia. E-mail: {a1.algarni,y2.li,yue.xu}@qut.edu.au | Dept. of Applied Informatics and Multimedia, Asia University, Taichung, Taiwan. E-mail: [email protected]
Abstract: It is a big challenge to clearly identify the boundary between positive and negative streams. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on RCV1, and substantial experiments show that the proposed approach achieves encouraging performance.
Keywords: Information filtering, text mining, pattern mining, relevance feedback, information retrieval