Affiliations: [a] Department of Computer Science, Georgia Southern University, Statesboro, GA, USA. E-mail: [email protected] | [b] Department of Information Technology, Kennesaw State University, Marietta, GA, USA. E-mail: [email protected] | [c] Department of Computer Science, University of Georgia, Athens, GA, USA. E-mail: [email protected]
Abstract: Probabilistic topic models, which frequently represent topics as multinomial distributions over words, have been extensively used for discovering latent topics in text corpora. However, because topic models are entirely unsupervised, they may lead to topics that are not understandable in applications. Recently, several knowledge-based topic models have been proposed which primarily use word-level domain knowledge in the model to enhance the topic coherence and ignore the rich information carried by entities (e.g, persons, locations, organizations, etc.) associated with the documents. Additionally, there exists a vast amount of prior knowledge (background knowledge) represented as Linked Open Data (LOD) datasets and other ontologies, which can be incorporated into the topic models to produce coherent topics. In this paper, we introduce a novel regularization entity-based topic model (RETM), which integrates an ontology with an entity-based topic model (EntLDA) to increase the coherence of the identified topics through the topic modeling process. Our experimental results demonstrate the effectiveness of the proposed model in improving the coherence of topics.