Link traversal querying for a diverse Web&nbsp;of&nbsp;Data

Umbrich, Jürgen; Hogan, Aidan; Polleres, Axel; Decker, Stefan

doi:10.3233/SW-140164

Link traversal querying for a diverse Web of Data

Article type: Research Article

Authors: Umbrich, Jürgen^{a; *} | Hogan, Aidan^b | Polleres, Axel^a | Decker, Stefan^c

Affiliations: [a] Vienna University of Economics and Business, Welthandelsplatz 1, 1020 Vienna, Austria. E-mails: [email protected], [email protected] | [b] Dept. of Computer Science, Universidad de Chile, Blanco Encalada 2120, Santiago, Chile. E-mail: [email protected] | [c] INSIGHT @ NUI Galway, National University of Ireland, Galway, Ireland. E-mail: [email protected]

Correspondence: [*] Corresponding author. E-mail: [email protected]

Abstract: Traditional approaches for querying the Web of Data often involve centralised warehouses that replicate remote data. Conversely, Linked Data principles allow for answering queries live over the Web by dereferencing URIs to traverse remote data sources at runtime. A number of authors have looked at answering SPARQL queries in such a manner; these link-traversal based query execution (LTBQE) approaches for Linked Data offer up-to-date results and decentralised (i.e., client-side) execution, but must operate over incomplete dereferenceable knowledge available in remote documents, thus affecting response times and “recall” for query answers. In this paper, we study the recall and effectiveness of LTBQE, in practice, for the Web of Data. Furthermore, to integrate data from diverse sources, we propose lightweight reasoning extensions to help find additional answers. From the state-of-the-art which (1) considers only dereferenceable information and (2) follows rdfs:seeAlso links, we propose extensions to consider (3) owl:sameAs links and reasoning, and (4) lightweight RDFS reasoning. We then estimate the recall of link-traversal query techniques in practice: we analyse a large crawl of the Web of Data (the BTC’11 dataset), looking at the ratio of raw data contained in dereferenceable documents vs. the corpus as a whole and determining how much more raw data our extensions make available for query answering. We then stress-test LTBQE (and our extensions) in real-world settings using the FedBench and DBpedia SPARQL Benchmark frameworks, and propose a novel benchmark called QWalk based on random walks through diverse data. We show that link-traversal query approaches often work well in uncontrolled environments for simple queries, but need to retrieve an unfeasible number of sources for more complex queries. We also show that our reasoning extensions increase recall at the cost of slower execution, often increasing the rate at which results return; conversely, we show that reasoning aggravates performance issues for complex queries.

Keywords: Linked data, SPARQL, RDFS, OWL, Semantic Web, RDF, Web of Data, live querying, reasoning

DOI: 10.3233/SW-140164

Journal: Semantic Web, vol. 6, no. 6, pp. 585-624, 2015

Published: 2015

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia