Note: [**] Work done while at Xerox Research Centre Europe.
Abstract: The World Wide Web contains a large quantity of community created knowledge of instructional nature. Similarly, in a commercial setting, databases of instructions are used by customer-care providers to guide clients in the resolution of issues. Most of these instructions are expressed in natural language. Knowledge Bases including such information are valuable through the sum of their single entries. However, as each entry is created mostly independently, users (e.g. other community members) cannot take advantage of the accumulated knowledge that can be developed via the aggregation of related entries and of relevant information that can be found in external knowledge bases. In this paper we consider the problem of linking Knowledge Base entries to other relevant parts of the Knowledge Base and to third-party semi-structured knowledge sources. To achieve this, we propose (i) a new method to detect actionable phrases – text fragments that describe how to perform a certain action – and link them to other entries; and (ii) a new method to detect entities and link them to the Linked Open Data cloud. The method that we implemented for extracting actionable phrases achieves an F-score of 67.35%. We show that limiting the linking space to actionable phrases results in better linking quality than using coarser-grained spans of text, as proposed in other approaches to the task. In addition, we propose a new Linked Open Data linking method that uses a global optimization score to filter the set of possible candidates, increasing precision when compared to a standard method implemented in dbpedia-spotlight. Besides the above scientific contributions, we also present a detailed error analysis, and release our annotations to the community to foster future research on the subject.
Keywords: procedural knowledge, forum mining, blog analysis, linked data, information extraction