Abstract: Named Entity Disambiguation is the task of assigning entities from a Knowledge Graph (KG) to mentions of such entities in a textual document. The state-of-the-art for this task balances two disparate sources of similarity: lexical, defined as the pairwise similarity between mentions in the text and names of entities in the KG; and semantic, defined through some graph-theoretic property of a subgraph of the KG induced by the choice of entities for each mention. Departing from previous work, our notion of semantic similarity is rooted in Information Theory and is defined as the mutual information between random walks on the disambiguation graph induced by choice of entities for each mention. We describe an iterative algorithm based on this idea, and show an extension that uses learning-to-rank, which yields further improvements. Our experimental evaluation demonstrates that this approach is robust and very competitive on well-known existing benchmarks. We also justify the need for new and more difficult benchmarks, and provide an extensive experimental comparison of our method and previous work on these new benchmarks.
Keywords: Named entities, entity linking, entity disambiguation, relatedness measure, random walk, benchmarking