Affiliations: [a] Faculty of Library, Information and Media Science, University of Tsukuba, Japan. E-mails: [email protected], [email protected] | [b] Information and Society Research Division, National Institute of Informatics, Japan. E-mail: [email protected]
Abstract: Many people share their daily events and opinions on Twitter. Some tweets are beneficial and others are related to such aspects of a user’s real-life as eating, traffic conditions, and weather. In this paper, we propose an inference method of the real-life aspect distribution of tweets using labeled tweets. Our method infers the aspect probability distributions by a hierarchical estimation framework (HEF), which is hierarchically composed of both unsupervised and supervised machine learning methods. In the first phase, it extracts topics from a sea of tweets using Latent Dirichlet Allocation (LDA). In the second phase, it builds associations between topics and real-life aspects using a small set of labeled tweets. The probability distribution of aspects is inferred using the associations based on the bag of terms extracted from unknown tweets. Our sophisticated experimental evaluations with a large amount of actual tweets demonstrate the high efficiency and robustness of our inference method. Especially in the case of single-label training, HEF showed significantly lower JSD values than other baseline methods, such as Naive Bayes, SVM, and L-LDA.
Keywords: Twitter, real life, hierarchical estimation framework, probability distribution inference, t-test