Affiliations: [a] Ming Hseih Department of Electrical Engineering, University of Southern California, Los Angeles, CA, USA. E-mails: [email protected], [email protected] | [b] Department of Computer Science, University at Albany – SUNY, Albany, NY, USA. E-mail: [email protected]
Abstract: Knowledge Graphs (KGs) have become useful sources of structured data for information retrieval and data analytics tasks. Enabling complex analytics, however, requires entities in KGs to be represented in a way that is suitable for Machine Learning tasks. Several approaches have been recently proposed for obtaining vector representations of KGs based on identifying and extracting relevant graph substructures using both uniform and biased random walks. However, such approaches lead to representations comprising mostly popular, instead of relevant, entities in the KG. In KGs, in which different types of entities often exist (such as in Linked Open Data), a given target entity may have its own distinct set of most relevant nodes and edges. We propose specificity as an accurate measure of identifying most relevant, entity-specific, nodes and edges. We develop a scalable method based on bidirectional random walks to compute specificity. Our experimental evaluation results show that specificity-based biased random walks extract more meaningful (in terms of size and relevance) substructures compared to the state-of-the-art and the graph embedding learned from the extracted substructures perform well against existing methods in common data mining tasks.
Keywords: Relevance metrics, graph embedding, Linked Open Data, data mining, recommender systems, RDF, SPARQL, Semantic Web, DBpedia