Note: [] Corresponding author. School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD 4072, Australia. Phone: +61-7-33651187, E-mail: [email protected].
Abstract: State-of-the-art studies on cyberbullying detection, using text classification, predominantly take it for granted that streaming text can be completely labelled. However, the rapid growth of unlabelled data generated in real time from online content renders this virtually impossible. In this paper, we propose a session-based framework for automatic detection of cyberbullying within the large volume of unlabelled streaming text. Given that the streaming data from Social Networks arrives in large volume at the server system, we incorporate an ensemble of one-class classifiers in the session-based framework. System uses Multi-Agent distributed environment to process streaming data from multiple social network sources. The proposed strategy tackles real world situations, where only a few positive instances of cyberbullying are available for initial training. Our main contribution in this paper is to automatically detect cyberbullying in real world situations, where labelled data is not readily available. Initial results indicate the suggested approach is reasonably effective for detecting cyberbullying automatically on social networks. The experiments indicate that the ensemble learner outperforms the single window and fixed window approaches, while the learning process is based on positive and unlabelled data only, no negative data is available for training.