You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.
Go to headerGo to navigationGo to searchGo to contentsGo to footer
In content section. Select this link to jump to navigation

Conjunctive query answering over unrestricted OWL 2 ontologies

Abstract

Conjunctive Query (CQ) answering is a primary reasoning task over knowledge bases. However, when considering expressive description logics, query answering can be computationally very expensive; reasoners for CQ answering, although heavily optimized, often sacrifice expressive power of the input ontology or completeness of the computed answers in order to achieve tractability and scalability for the problem. In this work, we present a hybrid query answering architecture that combines various services to provide a CQ answering service for OWL. Specifically, it combines scalable CQ answering services for tractable languages with a CQ answering service for a more expressive language approaching the full OWL 2. If the query can be fully answered by one of the tractable services, then that service is used, to ensure maximum performance. Otherwise, the tractable services are used to compute lower and upper bound approximations. The union of the lower bounds and the intersection of the upper bounds are then compared. If the bounds do not coincide, then the “gap” answers are checked using the “full” service. These techniques led to the development of two new systems: (i) RSAComb, an efficient implementation of a new tractable answering service for RSA (role safety acyclic) (ii) ACQuA, a reference implementation of the proposed hybrid architecture combining RSAComb, PAGOdA, and HermiT to provide a CQ answering service for OWL. Our extensive evaluation shows how the additional computational cost introduced by reasoning over a more expressive language like RSA can still provide a significant improvement compared to relying on a fully-fledged reasoner. Additionally, we show how ACQuA can reliably match the performance of PAGOdA, a state-of-the-art CQ answering system that uses a similar approach, and can significantly improve performance when PAGOdA extensively relies on the underlying fully-fledged reasoner.

1.Introduction

Conjunctive Query (CQ) answering over Knowledge Bases (KBs) is a crucial reasoning task for many applications. However, when considering expressive Description Logic (DL) languages, query answering is computationally very expensive, even when considering only complexity w.r.t. the size of the data (data complexity) [62]. Fully-fledged reasoners oriented towards CQ answering over unrestricted OWL 2 ontologies exist but, although heavily optimized, they are only effective on small to medium datasets. In order to achieve tractability and scalability for the problem, we need to rely on limiting the expressive power of the input ontology or sacrifice the completeness of the computed answers.

Query answering procedures have been developed for several fragments of OWL 2 for which CQ answering is tractable with respect to data complexity [9]. Three such fragments have been standardized as OWL 2 profiles, and CQ answering techniques for these fragments have been shown to be highly scalable at the expense of expressive power [10,46,51,68,72,73]. An interesting fragment of OWL 2, tractable for standard reasoning tasks, is RSA, an ontology language that subsumes all the OWL 2 profiles, first presented by Carral et al. [14] and for which a CQ answering algorithm based on the combined approach technique [46,47] was proposed by Feier et al. [22].

In order to deal with more expressive ontologies, several techniques have been proposed to compute a sound subset of answers to a given CQ. One such technique is to approximate the input ontology to a tractable fragment, so a tractable algorithm can then be used to answer CQs over the approximated ontology. A particularly interesting approach to CQ answering over unrestricted OWL 2 ontologies, using a combination of the aforementioned techniques, is adopted by PAGOdA [85]. Its “pay-as-you-go” approach uses a Datalog reasoner to handle the bulk of the computation, computing lower and upper approximations of the answers to a query, while relying on a fully-fledged OWL 2 reasoner (HermiT [24]) only as necessary to fully answer the query.

While PAGOdA is able to avoid the use of a fully-fledged OWL 2 reasoner in some cases (i.e., when the lower and upper answer approximations coincide), its performance rapidly deteriorates when the input query requires (extensive) use of the underlying OWL 2 reasoner. The computation of lower and upper bounds is achieved by under- and over-approximating the ontology into the RL profile of OWL 2 so that a tractable reasoner can be used for CQ answering. The tractability of RL is achieved by avoiding problematic interactions between axioms that can cause an exponential blow-up of the computation (so-called and-branching). As it turns out, this elimination of problematic interactions between axioms is rather coarse, and PAGOdA ends up falling back to the underlying OWL 2 reasoner even when it is not really needed.

This work borrows from this “pay-as-you-go” technique and builds upon existing CQ answering techniques over OWL 2 ontologies. We propose a new hybrid query answering architecture that combines black-box11 services to provide a CQ answering service for OWL. Specifically, it combines scalable CQ answering services for tractable languages with a CQ answering service for a more expressive language approaching the full OWL 2. If the query can be fully answered by one of the tractable services, then that service is used. Otherwise, the tractable services are used to compute lower and upper bound approximations, taking the union of the lower bounds and the intersection of the upper bounds. If the bounds do not coincide, then the “gap” answers are checked using the “full” service. When considering ontology approximations “from below”, we introduce a novel algorithm to compute a lower bound to the answers to an input query by means of approximation to RSA. This is done by ensuring that all the constraints for the RSA language are satisfied in the input KB. Similarly, we propose an algorithm to compute an approximation “from above” targetting RSA+, an extension of RSA, for which the combined approach for CQ answering for RSA is still complete. The combined approach for RSA can then be used in both cases to compute the answer bounds. These techniques led to the development of two new system: RSAComb and ACQuA (Answering Conjunctive Queries using Approximation).

RSAComb

An efficient implementation [39,41] of the combined approach algorithm for RSA [22], reorganized to fit the new implementation design and the integration of RDFox [54,55,57,60] as a backend Datalog reasoner. We streamlined the execution of the algorithm by factoring out those steps in the combined approach that are query independent to make answering multiple queries over the same knowledge base more efficient. In addition, we included an improved version of the filtering step for the combined approach. The system accepts any OWL 2 KB and includes a customizable approximation step to languages compatible with the RSA combined approach. The system is further extended with a reference implementation of the novel approximation algorithms for the computation of answer bounds mentioned above.

ACQuA

A reference implementation [42] of the hybrid architecture mentioned above, combining RSAComb, PAGOdA [85], and HermiT [24] to provide a CQ answering service for OWL. The resulting system ensures the same “pay-as-you-go” capabilities of the systems it is based on. The system has been designed to accommodate a high degree of modularity; the services it is built upon can be potentially substituted or augmented with more capable ones to improve the overall performance.

We carried out an extensive evaluation both for RSAComb, as a standalone tool, and for ACQuA, to assess their effectiveness, and compare our results with PAGOdA, aiming, primarily, at improving some shortcomings of the latter tool. Our experimental results show that the new technique yields significant performance improvements in several important application scenarios. Both ACQuA22 and RSAComb33 have been released as free and open source software. Source code and documentation are available online.

The present paper includes some previously published work:

  • The algorithm for the approximation of an unrestricted OWL 2 ontology to RSA, sound for CQ answering, was presented in [40].

  • A full description of RSAComb system was presented in [39].

2.Preliminaries

We assume familiarity with standard concepts of first-order logic (FOL) such as term, variable, constant, predicate, atom, literal; refer to [1,5] for a formal introduction to these concepts.

We define a rule as an expression of the form φ(x,y)ψ(x), with φ(x,y) a conjunction of literals over variables xy and ψ(x) a non-empty conjunction of atoms over x. Given a role r, we denote head(r) the set of atoms in ψ(x), and body+(r) (resp. body(r)) the set of positive (resp. negative) literals in φ(x,y).

We will call a rule definite without negation in its body, and Datalog a function-free definite rule. A Datalog rule is disjunctive if it admits disjunction in the head. A fact is a Datalog rule with an empty body. The definition can be trivially extended to sets of rules.

A program Π is a set of rules. Let pred(X) be the set of predicates in X (with X being either a set of atoms, a rule, or a program). A stratification of a program Π is a function δ:pred(Π){1,,k} with k|pred(Π)|, s.t. for every rule rΠ and ppred(head(r)) it holds:

  • for every qpred(body+(r)), δ(q)δ(p);

  • for every qpred(body(r)), δ(q)<δ(p);

The stratification partition of Π induced by δ is the sequence (Π1,,Πk) with each Πi be the set of rules rΠ s.t. maxahead(r)(δ(pred(a)))=i. Programs Πi are called strata of Π. A program is stratified if it admits a stratification. All definite programs are stratified.

A stratified program Π has a least Herbrand model (LHM), which is constructed using the immediate consequence operator TΠ. Let HU and HB be the Herbrand universe and the Herbrand base of Π. Let SHB, then, TΠ(S) consists of all facts in head(r)σ with rΠ and σ a substitution for the variables in r to HU satisfying body+(r)σS and body(r)σS=. The powers of TΠ are defined as follows: (i) TΠ0(S)=S, (ii) TΠn+i(S)=TΠ(TΠn(S)), (iii) TΠω(S)=i=0TΠi(S). Let (Π1,,Πk) be a stratification partition for Π. We define U1=TΠ1ω() and for each 1i<k, Ui+1=TΠi+1ω(Ui) Then, the LHM of Π is Uk and is denoted as M[Π].

Given a stratified program Π, we define Π, the program extended with standard axiomatization rules for equality (≈) and truth value (⊤) [22].

2.1.Ontologies and conjunctive query answering

Next we give a brief overview of the description logic languages used in the paper. We will define them as restrictions of SROIQ [36], the description logic underpinning the OWL 2 ontology language [58], standardized by W3C. An ontology signature is a triple NC,NR,NI of computable disjoint sets of concept names, role names and individuals respectively. Two special concepts are provided: ⊥ (bottom concept) and ⊤ (top concept). We define a role as an element of NR{RRNR}, where R is called inverse role. We also introduce a function Inv(·) closed for roles s.t. RNR:Inv(R)=R,Inv(R)=R. An RBox R is a finite set of axioms of type (R2)–(R4) in Table 1 with R,S,T roles. We denote R as a minimal relation over roles closed by reflexivity and transitivity s.t. RRS, Inv(R)RInv(S) hold if RSR. A role R is transitive if a role T exists such that TRR, RRT and either TTTR or TTTR. A TBox T is a set of axioms of type (T1)–(T7) where A,BNC, aNI, R is a role and Self(·) denotes the local reflexivity of a role. An ABox A is a finite set of assertions of type (A1)–(A2) with ANC, a,bNI and RNR. A SROIQ ontology is a set of axioms O=TR.44 An ontology is SHOIQ+ if we restrict axioms (R4) to role transitivity (i.e., R=S=T). An ontology is SHOIQ if we further exclude axioms of type (T6), (T7) and (R3). An ALCHOIQ+ ontology (resp. ALCHOIQ) is obtained from SHOIQ+ (resp. SHOIQ) by disallowing (R4) axioms altogether. A Horn-ALCHOIQ+ ontology (resp. Horn-ALCHOIQ) is obtained from ALCHOIQ+ (resp. ALCHOIQ) by restricting m=1 in axioms (T1) and (T4). Finally, given an ontology language L, we define an L knowledge base as a couple K=O,A comprising an L ontology O=TR and an ABox A.

Table 1

Normalized SROIQ axioms and their translation into logic rules

Axiom / Role / Assertion αDefinite rules π(α)
(R1)RR(x,y)R(y,x)R(y,x)R(x,y)
(R2)RSR(x,y)S(x,y)
(R3)RSR(x,y)S(x,y)
(R4)RSTR(x,y)S(y,z)T(x,z)
(T1)i=1nAii=1mBii=1nAi(x)i=1mBi(x)
(T2)A{a}A(x)xa
(T3)R.ABR(x,y)A(y)B(x)
(T4)AmR.BA(x)i=1m+1[R(x,yi)B(yi)]1i<jm+1yiyj
(T5)AR.BA(x)R(x,fR,BA(x))B(fR,BA(x))
(T6)ASelf(R)A(x)R(x,x)
(T7)Self(R)AR(x,x)A(x)
(A1)A(a)A(a)
(A2)R(a,b)R(a,b)

Table 1 also introduces a normal form for each of these description logic languages, and w.l.o.g. we assume that any ontology introduced is restricted to these axioms. Each axiom in Table 1 corresponds to a single logic rule, provided on the right. We define π(·) as the translation function from axioms to logic rules; the function can be trivially extended to sets of axioms by mapping π over the set and to knowledge bases (i.e., π(K)={π(α)αATR}). Furthermore, we define ΠK=π(K), as the logic program derived from K extended with bottom and equality axiomatization rules (as defined in [14]).

OWL 2 profiles [53] have been defined as fragments of OWL 2, designed to provide a desirable balance between computational complexity of standard reasoning tasks and expressiveness of the ontology language. We will define these standard profiles as fragments of Horn-ALCHOIQ.55

  • 1. OWL 2 EL is based on the EL++ DL language [4,6]; it does not contain inverse roles (R1) and axioms of type (T4);

  • 2. OWL 2 RL is inspired by Description Logic Programs [29] and corresponds to a subset of Datalog; it does not contain axioms of type (T5);

  • 3. OWL 2 QL is based on the DL-LiteR DL language [10]; it does not contain axioms (T2), (T4), axioms (T1) satisfy n=1 and axioms (T3) satisfy A=.

Note that, when not considering transitive roles, the definition of EL++, the logic underpinning OWL 2 EL, matches ELHOr [72,85].

A conjunctive query (CQ) q is a formula y.ψ(x,y) with ψ(x,y) a conjunction of function–free atoms over xy, and x, y are called answer variables and bounded variables, respectively. We call boolean conjunctive query (BCQ) a query where |x|=0 (i.e., the set of answer variable is empty). A query is atomic if ψ(x,y) consists of a single atom and |y|=0.

A knowledge base K=TR,A is satisfiable if ΠKy.(y). A tuple of constants a is a possible answer to q w.r.t. K if |a|=|x| and each constant in a occurs in K. a is a certain answer to q w.r.t. K if K is unsatisfiable or a is a possible answer and ΠKy.ψ(a,y). The set of certain answers to a query q is denoted by cert(q,K). We say that a possible answer a is a ground answer to q w.r.t. a satisfiable knowledge base K if a tuple of constants e in K exists such that |y|=|e| and ΠKψ(a,e). We denote the set of ground answers with ground(q,K). It is straightforward to see that ground(q,K)cert(q,K).

CQs can be alternatively represented as Datalog rules. Let Pq a fresh predicate of arity |x| uniquely associated with q. Then let qr=Pq(x)ψ(x,y) be the Datalog rule representing q. This allows to characterize certain answers by means of entailment of a single fact, i.e., acert(q,K) iff ΠK{qr}Pq(a).

We say that q is internalizable if it can be turned into an ontology axiom. This process is known as rolling-up [36] and is implemented in some solvers (e.g., Pellet [71]) to provide sound and complete CQ answering over OWL 2 DL over certain answer semantics when considering internalizable queries.

Given a knowledge base K and a CQ q, we call conjunctive query answering the reasoning problem of computing all certain answers of q w.r.t. K. The decision problem associated with CQ answering is called conjunctive query entailment (CQE). Given a knowledge base K, a CQ q and a possible answer a, CQE is the problem of deciding whether ΠKyψ(a,y).

2.2.PAGOdA

PAGOdA is a hybrid reasoner for sound and complete CQ answering over OWL 2 KBs, adopting a “pay-as-you-go” technique to compute the certain answers to a given query. The idea is to compute lower/upper bound approximations to the answers to a query by approximating the input ontology into a less expressive language and possibly provide a fallback (more expensive) algorithm to process the answers in the gap between the bounds; to achieve this, it uses a combination of a Datalog reasoner and a fully-fledged OWL 2 reasoner. PAGOdA treats the two systems as black boxes and tries to offload the bulk of the computation to the former and relies on the latter only when necessary.

The capabilities, performance, and scalability of PAGOdA inherently depend on the ability of the fully-fledged OWL 2 reasoner in use, and the ability to delegate the workload to a given Datalog reasoner. In the best scenario, with an OWL 2 reasoner, PAGOdA is able to answer internalisable queries [36] under certain answer semantics [85] over OWL 2 DL.

In the following is a high level description of the procedure adopted by PAGOdA to compute the answers to a query. This will prove useful to understand how this approach will be later integrated in our system, ACQuA. For a more in-depth description of the algorithm and heuristics in use, we refer the reader to [84].

Given a KB K=TR,A66 and a query q, PAGOdA executes the following steps in order to compute the answers to q w.r.t. K:

  • 1. the Datalog reasoner is exploited to compute a lower bound Lq and an upper bound Uq to the answers to the query q. This is achieved by approximating the input KB K into a tractable language to be handled by the Datalog reasoner. Depending on the approximation procedure, running the query over the approximated ontology will result in either a lower or an upper bound of the certain answers to the query. The lower bound Lq is obtained as follows:

    • (a) the disjunctive Datalog subset of the input ontology, denoted with KDD, is computed by dropping any axiom that does not correspond to a disjunctive Datalog rule;

    • (b) using a variant of shifting [18], KDD is polynomially transformed in order to eliminate disjunction in the head. The resulting ontology shift(KDD) is sound but not necessarily complete for CQ answering;

    • (c) a first materialization is performed, i.e., M1=M[shift(KDD)], and the resulting facts are added back to the input knowledge base to obtain K=TR,AM1;

    • (d) the ELHOr [72] subset of K is computed, denoted KEL, dropping any axiom that is not in ELHOr;

    • (e) the combined approach for ELHOr [51,73] is used to compute the answers to the query q over KEL.

    The upper bound Uq is computed by executing the query over the ontology, modified as follows:

    • (a) the ⊥ concept is substituted with a fresh concept name s to avoid directly deriving falsehood;

    • (b) disjuncts in the head of an axiom are reduced to a single disjunct. The “most favourable” disjunct is chosen according to a polynomial choice function that reasons over the dependency graph of the input ontology;

    • (c) existential axioms of type (T5) are constant Skolemized.

  • 2. If lower and upper bound coincide (i.e., Lq=Uq) then the Datalog reasoner was able to provide a sound and complete set of answers to the input query. The computation terminates;

  • 3. otherwise, the “gap” between the upper and lower bound (i.e., Gq=UqLq) is a set of answers that need to be verified against the KB using a fully-fledged OWL 2 reasoner. The Datalog reasoner is again exploited for this step to compute a subset Kq of the KB K that is enough to check whether the answers in Gq are certain or spurious;

  • 4. for each aGq, the fully-fledged reasoner is used to check whether Kqq(a). This process is further optimized by reducing the number of answers in Gq that need to be checked by means of summarization [16];

  • 5. once all spurious answers have been removed from Gq, LqGq is returned.

Let’s take the lower bound computation as an example: the two performed approximations (i.e., to disjunctive Datalog and to ELHOr) are handled independently, by means of materialization in the first case, and the combined approach in the second; this allows PAGOdA to avoid having to deal with and-branching and the resulting intractability of most reasoning problems (see Definition 2.1). In fact, OWL 2 RL (Datalog) and ELHOr are used by PAGOdA to eliminate all interactions between axioms (T5) and either axioms (T4) or axioms (T3) and (R1).77 However, not all such interactions cause an exponential jump in complexity, and PAGOdA’s filtering of such cases is unnecessarily coarse. We will see in the next sections, how this procedure can be improved by introducing an alternative approximation algorithm.

PAGOdA’s reference implementation88 uses RDFox as a Datalog reasoner and HermiT as the underlying fully-fledged reasoner. It accepts any OWL 2 DL ontology as input, alongside a dataset in Turtle format and CQs in SPARQL [33].

PAGOdA ensures that the returned answers are always complete under ground semantics, while being ultimately limited by the capabilities of HermiT when considering the returned answers under certain answer semantics. HermiT does not natively support CQ answering and the process needs to be reduced to fact entailment first. This is possible when the input query is internalisable, i.e., the query can be rolled-up into a set of DL concept assertions. In this scenario PAGOdA returns a set of answers that is sound and complete under certain answers semantics if the bounds match or the query can be internalized into a DL concept. Otherwise, PAGOdA will return a sound set of answers (complete under ground semantics) and a bound on the incompleteness of the computed answers (under certain answers semantics).

2.3.The RSA ontology language

As we mentioned above, ACQuA combines multiple ontology approximation algorithms to compute the answers to an input query. As part of this work, we propose novel approximation algorithms that target RSA (and its extension RSA+). In this section, we provide a brief introduction to the RSA ontology language [14], along with a description of a combined approach for CQ answering [22].

RSA (role safety acyclic) is a class of ontologies designed to subsume all OWL 2 profiles, while maintaining tractability of standard reasoning tasks. The RSA ontology language is designed to avoid interactions between axioms that can result in the ontology being satisfied only by exponentially large (and potentially infinite) models. This problem is often called and-branching and can be caused by interactions between axioms of type (T5) with either axioms (T3) and (R1), or axioms (T4), in Table 1.

Fig. 1.

Example of exponential model enumerating numbers from 0 to 2n1 for n=3. The KB is polynomial in n.

Example of exponential model enumerating numbers from 0 to 2n−1 for n=3. The KB is polynomial in n.
Example 2.1.

Interaction between existential quantifiers (axioms of type (T5)) and universal quantifiers (encoded by axioms of type (T3) and (R1)) can lead to an ontology that may only be satisfied by models of exponential size.

Consider the following knowledge base with ABox A={(¬A0¬An1)(a)} for some n, and a TBox containing the following axioms, for 0i<n:

(1)¬Aij<iAjBiR.AiR.(j<i¬Aj)(2)j>i(BiAjR.Aj)(3)j>i(Bi¬AjR.¬Aj)
The size of the knowledge base is polynomial w.r.t. n. It can be shown that this knowledge base enforces a chain of individual of length 2n where each individual represents a number from 0 to 2n1 encoded in binary (i.e., each Ai represents a bit in position i, where an Ai encodes a 1 and a ¬Ai encodes a 0). Figure 1 shows an example of the exponentially large model for n=3.

RSA is based on the Horn-ALCHOIQ ontology language, restricting the interaction between axioms to ensure a polynomial bound on model size [14]. For the following section we will consider a generic Horn-ALCHOIQ knowledge base K=TR,A over the signature ΣK=NC,NR,NI.

Definition 2.1

Definition 2.1([22], Definition 1).

A role R in K is unsafe if it occurs in axioms (T5), and there is a role S s.t. either of the following holds:

  • 1. RRInv(S) and S occurs in an axiom (T3) with left-hand side concept S.A where A;

  • 2. S is in an axiom (T4) and RRS or RRInv(S).

A role R in K is safe if it is not unsafe.

Note that, all OWL 2 profiles (RL, EL and QL) as defined in Section 2.1, contain only safe roles.

Example 2.2.

In Example 2.1, R is unsafe. In fact, even for n=1, we have that

(4)¬A0B0R.A0B0A1R.A1
are part of the program and can be rewritten as
(5)¬A0B0¬A0R.A0B0A1XR.XA1
using some standard normalization and the fact that AR.BRAB.

Then, R appears in an axiom of type (T5), SR, RRInv(S) and S occurs in an axiom (T3) with left-hand side concept S.A where A;

The distinction between safe and unsafe roles makes it possible to strengthen the translation π from Table 1 while preserving satisfiability and entailment of unary facts.

Definition 2.2

Definition 2.2([22], Definition 2).

Let vR,BA be a fresh constant for each pair of concepts A, B and each safe role R in K. The function πsafe is defined for each axiom α in K:

(6)πsafe(α)=A(x)R(x,vR,BA)B(vR,BA)if α is of type (T5) and R safeπ(α)otherwise.
Let P={πsafe(α)αK} and PK=P,.

Theorem 2.1

Theorem 2.1([14], Theorem 2).

A Horn-ALCHOIQ knowledge base K is satisfiable iff PK. If K is satisfiable, then, KA(c) iff A(c)M[PK] for each unary predicate A and constant c in K.

Note that if K contains unsafe roles, the model M[PK] might be exponentially large or infinite.

Potential bad interactions between unsafe roles can be avoided by detecting any cyclic or diamond-shape materialization involving unsafe roles. This check is performed by constant Skolemizing all existential axioms with an unsafe role, and building a graph representing the materialization process, projected on unsafe interaction. Checking that the graph is an oriented forest, along with some additional conditions which preclude harmful interactions between equality-generating axioms and inverse roles, leads to the definition of RSA.

Definition 2.3

Definition 2.3([22], Definition 3).

Let PE and E be fresh binary predicates, let U be a fresh unary predicate, and let uR,BA be a fresh constant for each concept A,BNC and each role RNR. Then, for each axiom α in K

(7)πRSA(α)=A(x)R(x,uR,BA)B(uR,BA)PE(x,uR,BA)if α is of type (T5)π(α)otherwise.
The program PRSA consists of πRSA(α) for each αK, rule U(x)PE(x,y)U(y)E(x,y) and facts U(uR,BA) for each uR,BA, with R unsafe.

Let MRSA be the LHM of PRSA,. Then, GK is the digraph with an edge (c,d) for each E(c,d) in MRSA. Knowledge base K is equality-safe if:

  • (i) for each pair of atoms wt (with w and y distinct) and R(t,uR,BA) in MRSA and each role S s.t. RInv(S), it holds that S does not occur in an axiom (T4), and

  • (ii) for each pair of atoms R(a,uR,BA),S(uR,BA,a) in MRSA with aNI, there is no role T such that both RRT and SRInv(T) hold.

We say that K is RSA if it is equality-safe and GK is an oriented forest.99

Definition 2.4.

With reference to Definition 2.3, let K be a Horn-ALCHOIQ+ knowledge base. Then, K is RSA+ if it is equality-safe and GK is an oriented forest.

The fact that GK is a DAG ensures that the LHM M[PK] is finite, whereas the lack of “diamond-shaped” subgraphs in GK guarantees polynomiality of M[PK]. The definition gives us a programmatic procedure to determine whether a Horn-ALCHOIQ (resp. Horn-ALCHOIQ+) KB is RSA (resp. RSA+).

Theorem 2.2

Theorem 2.2([14], Theorem 3).

If K is RSA, then the size of M[PK] is polynomial in the size of K.

Tractability of standard reasoning tasks for RSA ontologies follows from Theorem 2.1 and Theorem 2.2.

2.3.1.Combined approach for RSA

The combined approach for RSA consists of two main steps to be offloaded to a Datalog reasoner able to handle stratified negation and function symbols.

The first step computes the canonical model of an RSA ontology over an extended signature (introduced to deal with inverse roles and directionality of newly generated binary atoms). The computed canonical model is not universal and, as such, might lead to spurious answers in the evaluation of CQs.

The second step of the computation performs a filtration of the computed answers to identify only the certain answers to the input query.

Canonical model computation The computation of the canonical model for a knowledge base K is performed by computing the LHM of the definite program obtained by translating K according to the rules in Table 2. The translation is an enhanced version of the translation given in Table 1 where axioms of type (T5) are Skolemized if the role involved is unsafe, and constant Skolemized otherwise. Constant Skolemization of some axioms can introduce forks in the canonical model that can lead to spurious answers. In order to keep track of these forks, directionality is taken into account when Skolemizing an axiom; roles are annotated with the direction in which they are “generated” (during the materialization process), and the annotation is propagated according to axioms in the RBox. This is still not enough to detect spurious forks in the canonical model and, in particular, cycles of length one (self-loops) or two can be the source of ambiguity during materialization. In order to solve the ambiguity of the canonical model, cycles of length one and two are unfolded into cycles of length three and four, respectively.

First we define the Datalog program EK used to compute the canonical model for K.

Table 2

Translation of Horn-ALCHOIQ axioms to build EK

Axioms in KLP rules
non-(T5) axiom απ(α)
RS, {f,b}R(x,y)S(x,y)
R role, {f,b}R(x,y)R(x,y)
Rf(x,y)Inv(R)b(y,x)
Rb(x,y)Inv(R)f(y,x)
(T5) axiom, R unsafeA(x)Rf(x,fR,BA(x))B(fR,BA(x))
(T5) axiom, R safeA(x)notIn(x,unfold(A,R,B))Rf(x,vR,BA,0)B(vR,BA,0)
A(vR,BA,i)Rf(vR,BA,i,vR,BA,i+1)B(vR,BA,i+1), if Rconfl(R), for i=0,1
A(x)Rf(x,vR,BA,1)B(vR,BA,1), for every xcycle(A,R,B)
Definition 2.5

Definition 2.5([22], Definition 4).

Let confl(R) be the set of roles S s.t. RRT and SRInv(T) for some T. Let ≺ be a strict total order on triples (A,R,B), with R safe and A,B concept names in K. For each (A,R,B), let vR,BA,0, vR,BA,1 and vR,BA,2 be fresh constants; let self(A,R,B) be the smallest set containing vR,BA,0 and vR,BA,1 if Rconfl(R); and let cycle(A,R,B) be the smallest set of terms containing, for each Sconfl(R),

  • vS,CD,0 if (A,R,B)(D,S,C);

  • vS,CD,1 if (D,S,C)(A,R,B);

  • fS,CD(vS,CD,0) and each fT,EF(vS,CD,0) s.t. uS,CDuT,EF is in MRSA, if S is unsafe.

Finally, unfold(A,R,B)=self(A,R,B)cycle(A,R,B).

Let Rf and Rb be fresh binary predicates for each role R in K, let NI be a fresh unary predicate, and notIn be a built-in predicate which holds when the first argument is not an element of the set given as the second element. Let P be the smallest program with a rule NI(a) for each constant a and all rules in Table 2. We define EK=P,.

The set confl(R) intuitively contains the roles that are source of ambiguity in conjunction with R and hence need to be potentially unfolded if part of a loop; the arbitrary order ≺ determines the direction in which the loops are unfolded. Since the input ontology is RSA, there is no loop introduced by unsafe roles, and hence axiom of type (T5) involving unsafe roles don’t need to be constant Skolemized.

The canonical model for an RSA input ontology is defined as M[EK].

Theorem 2.3

Theorem 2.3([22],Theorem 3).

The following holds:

  • (i) M[EK] is polynomial in |K|;

  • (ii) K is satisfiable iff EKy.(y);

  • (iii) if K is satisfiable, KA(c) iff A(c)M[EK];

  • (iv) there are no terms s,t and role R s.t. EKRf(s,t)Rb(s,t).

Filtering spurious answers For the filtering step, a query dependent logic program Pq is introduced to filter out all spurious answers to an input query q over the extended canonical model M[EK] computed in the previous section. The program identifies and discards any match that cannot be enforced by a TBox alone and hence correspond to spurious answers introduced by the canonical model. This includes the task of detecting forks and cycles in the model and answers that contain anonymous terms (i.e., functional terms and constants introduced as part of the canonical model program). Rules for the filtering program are provided in Table 3.

Table 3

Rules in Pq. Variables u, v, w from U are distinct

(1)ψ(x,y)QM(x,y)
(2)named(a) for each constant a in O
(3a)QM(x,y),notNI(yi)id(x,y,i,i) for each 1i|y|
(3b)id(x,y,u,v)id(x,y,v,u)
(3c)id(x,y,u,v),id(x,y,v,w)id(x,y,u,w)
(4a)for all R(s,yi), S(t,yj) in q with yi,yjy
Rf(s,yi)Sf(t,yj)id(x,y,i,j)notstfk(x,y)
(4b)for all R(s,yi), S(yj,t) in q with yi,yjy
Rf(s,yi)Sb(yj,t)id(x,y,i,j)notstfk(x,y)
(4c)for all R(yi,s), S(yj,t) in q with yi,yjy
Rb(yi,s)Sb(yj,t)id(x,y,i,j)notstfk(x,y)
for all R(yi,yj), S(yk,yl) in q with yi,yj,yk,yly
(5a)Rf(yi,yj)Sf(yk,yl)id(x,y,j,l)yiyknotNI(yi)id(x,y,i,k)
(5b)Rf(yi,yj)Sb(yk,yl)id(x,y,j,k)yiylnotNI(yi)id(x,y,i,l)
(5c)Rb(yi,yj)Sb(yk,yl)id(x,y,i,k)yjylnotNI(yj)id(x,y,j,l)
(6)for each R(yi,yj) in q with yi,yjy and {f,b}
R(yi,yj)id(x,y,i,v)id(x,y,j,w)AQ(x,y,v,w)
for each {f,b}
(7a)AQ(x,y,u,v)TQ(x,y,u,v)
(7b)AQ(x,y,u,v)TQ(x,y,v,w)TQ(x,y,u,w)
(8a)QM(x,y)notnamed(x)sp(x,y) for each xx
(8b)fk(x,y)sp(x,y)
(8c)TQ(x,y,v,v)sp(x,y) for each {f,b}
(9)QM(x,y)notsp(x,y)Ans(x)

Filtering program Pq and its extension Pq,K with EK from Def. 2.5 are defined as follows.

Definition 2.6

Definition 2.6([22], Definition 5).

Let q=y.ψ(x,y) be a CQ, let QM, sp, and fk be fresh predicates of arity |x|+|y|, let id, AQ, TQ with {f,b} be fresh predicates of arity |x|+|y|+2, let Ans be a fresh predicate of arity |x|, let named be a fresh unary predicate, and let U be a set of fresh variables s.t. |U||y|. Then, Pq is the smallest program with all rules in Table 3, and Pq,K is defined as EKPq.

Theorem 2.4

Theorem 2.4([22], Theorem 4).

Let Pq be the filtering program for q, and Pq,K=EKPq. It holds that [22]:

  • (i) Pq,K is stratified;

  • (ii) M[Pq,K] is polynomial in |K| and exponential in |q|;

  • (iii) if K is satisfiable, xcert(q,K) iff Pq,KAns(x).

We can then build a worst-case exponential algorithm that, given an ontology K and a CQ q, it materializes Pq,K and returns all instances of predicate Ans. We obtain a “guess and check” algorithm that leads to an NP–completeness result for BCQs [22]. The algorithm first materializes EK in polynomial time and then guesses a match σ to q over the materialization; finally it computes (Pq,K)σ.

Theorem 2.5

Theorem 2.5([22], Theorem 5).

Checking whether Kq(x,y) with K an RSA ontology and q(x,y) a BCQ is NP–complete in combined complexity.1010

3.Overview

We propose a hybrid query answering architecture which provides CQ answering capabilities for OWL 2 by means of combining different answering services treated as black-boxes. In particular, we combine scalable CQ answering services targeting tractable ontology languages with answering services for more expressive languages approaching the full OWL 2.

Given an input CQ over a certain knowledge base, we process the query using the tractable services; if the query can be fully answered by one of these tractable services, we simply provide the resulting answers to the user. Otherwise, we compute multiple lower and upper bounds to the answers to the query by approximating the knowledge base “from above” and “from below” and taking the union of the lower bounds and the intersection of the upper bounds. Finally, if the bounds do not coincide, the “gap” answers are validated by using the “full” service.

As part of this work, we introduce a novel algorithm to compute a lower bound to the answers to an input query by means of approximation to RSA. Similarly, we propose an algorithm to compute an approximation “from above” targetting RSA+.

The reference implementation ACQuA is built on top of the following tools:

  • (i) RSAComb, a novel system for CQ answering over RSA ontologies, based on the combined approach, extended with algorithms to computes bounds of the answers to a query via approximation of the input KB to RSA;

  • (ii) PAGOdA, providing lower and upper bounds to the answers to a query and techniques to further refine these bounds to provide CQ answering capability over OWL 2 DL;

  • (iii) a fully-fledged reasoner (such as HermiT) for CQ answering over a certain ontology language.

These tools allowed us to build a fine-grained “pay-as-you-go” approach, offering suitable, performant solutions depending on the inputs to the system; overall, this results in a lower complexity of the answer computation, when support for high expressivity is not needed. Note that we included both RSAComb and PAGOdA in ACQuA because, in general, the bounds computed by RSAComb are incomparable with the ones produced by PAGOdA, as we will show in Sections 4 and 5. However, any of these components could be potentially substituted or augmented with more capable ones; in particular, any relevant service mentioned above could be used in ACQuA (e.g., the fully-fledged reasoner HermiT could be substituted with Konclude).

Fig. 2.

Workflow of the ACQuA system.

Workflow of the ACQuA system.

Given a generic KB K=TR,A and a CQ q(x)=yφ(x,y) containing only symbols in K, the combination of RSAComb, PAGOdA, and HermiT performs the following steps to compute the full set of answers to q(x) over K (see Fig. 2).

  • 1. A preliminary satisfiability check is performed over the input knowledge base K. The procedure terminates if K is unsatisfiable.

  • 2. If K is either RL or ELHOr return the answers provided by the lower bound algorithm in PAGOdA.1111 Otherwise, proceed to step 3.

  • 3. If K is RSA, return the full set of answers computed by RSAComb.1212 Otherwise, proceed to step 4.

  • 4. Compute the bounds to the answers to q as Lq=LPqLRq and Uq=UPqURq, with LPq,UPq and LRq,URq the lower and upper bounds computed by PAGOdA and RSAComb, respectively.

  • 5. If Gq=UqLq=0, return Lq. Otherwise, proceed to step 6.

  • 6. Compute Kq, a subset of K, relevant for the answering of q(x).

  • 7. Use HermiT on Kq, to check the entailment of the answers in Gq and remove any remaining spurious answer.

  • 8. Return LqGq.

The choice of fully-fledged reasoner ultimately determines the class of ontologies for which CQ answering is sound and complete under ground and/or certain answer semantics for the overall system. Thanks to RSAComb, ACQuA is sound and complete for CQ answering under certain answer semantics for ontologies in RSA [22]. With the integration of PAGOdA, and a suitable fully-fledged reasoner, like HermiT, ACQuA is able to answer internalisable queries [36] over OWL 2 DL under certain answer semantics [85].

Steps 2,6,7 and the computation of LPq,UPq in step 4 are offloaded to PAGOdA; we refer the reader to [85] for more details. We will instead focus our attention on the underlying RSAComb reasoner; in particular we dedicate Sections 45 to the description of the novel algorithms used in step 4 to compute LRq,URq via approximation to RSA. In Section 6 we provide more details on the design and architecture of ACQuA and RSAComb (both as a standalone system and its integration in ACQuA).

To help the reader follow along with the description of the proposed techniques, we consider the following running example.

Example 3.1.

Consider the KB Kex=TexRex,Aex and the CQs Qex, with ABox Aex containing assertions (a1)–(a10), TBox Tex containing axioms (t1)–(t9), RBox Rex containing axioms (r1)–(r2), and Qex containing queries (q1)–(q2) in Table 4.

Table 4

Running example Kex

(r1)publishedpublishedBy(a1)PhDStudent(bart)
(r2)acceptsreviews(a2)Researcher(lisa)
(a3)Journal(journal1)
(t1)PhDStudentStudentResearcher(a4)Journal(journal2)
(t2)JournalPaperThesis(a5)Journal(journal3)
(t3)ReportPaperThesis(a6)writes(bart,work1)
(t4)Journalpublished.Paper(a7)writes(lisa,work1)
(t5)Researcherwrites.Paper(a8)published(journal1,work1)
(t6)PaperpresentedAt.Conference(a9)JournalPaper(work1)
(t7)Paper1presentedAt.Conference(a10)Report(work1)
(t8)Conferenceaccepts.Paper
(t9)reviews.ConferenceConferencePaper
(q1)q1(x2)=publishedBy(x1,x2)
(q2)q2(x1,x2)=published(x1,x3)published(x2,x3)x1x2

Intuitively, the ABox contains a collection of statements about researchers and their research outputs; on top of that, the ontology (RBox and TBox) models additional information about the relationships between different types of papers and their publications processes. The CQs ask about venues and works published in multiple venues. For reader’s convenience, a visual representation of the ABox Aex is shown in Fig. 3.

Fig. 3.

Graphical representation of the ABox Aex.

Graphical representation of the ABox Aex.

Some axioms are not expressed in normal form (see Table 1) and can be further normalized as follows: axiom (t1) can be rewritten as

(t1bis)PhDStudentStudentPhDStudentResearcher
while axiom (r1) becomes
(r1bis)publishedpublishedBypublishedBypublished
Note that Kex is not in Horn-ALCHOIQ because of axiom (t3), and hence it is neither RSA nor RSA+.

4.Lower bound computation

In this section we present a technique to compute a lower bound to the answers to an input query, by means of approximating the input KB to RSA.

RSA is not purely syntactically defined, and instead introduces a set of constraints over the ontology language Horn-ALCHOIQ; as such, the naïve approximation that consists in discarding any axiom type which is not in the target approximation language does not work. Instead, we split our approximation algorithm in three sub-steps, each building on top of the previous one:

  • 1. From a generic SROIQ ontology to ALCHOIQ by discarding any axiom that is not in the target language;

  • 2. From ALCHOIQ to Horn-ALCHOIQ by means of program shifting;

  • 3. From Horn-ALCHOIQ to RSA by modifying the input KB in order to enforce the constraints imposed by the RSA definition.

4.1.Approximation to ALCHOIQ

This first step is performed by discarding any axiom that is not in ALCHOIQ, namely axioms of type (R3)–(R4) and (T6)–(T7) in Table 1.

Let K be the ALCHOIQ restriction of a KB K. By the monotonicity of FOL all certain answers w.r.t. K are also certain answers w.r.t. K; in other words, discarding axioms in a KB will lead to having more models, which, in turn, leads to fewer answers, under certain answer semantics. Moreover, if K is unsatisfiable, so is K.

In Example 3.1, Kex is in ALCHOIQ, so no axioms are discarded.

4.2.Approximation to Horn-ALCHOIQ

We will now describe how to reduce an ALCHOIQ ontology to Horn-ALCHOIQ. This involves the approximation of axioms of type (T1), (T4) by eliminating the disjunction in the head of the axioms.1313 Simply discarding them is not desirable, especially when considering that disjunctive axioms are quite common in practice.

To address this and improve the approximation to Horn-ALCHOIQ we rely on a technique known as program shifting [18] to convert disjunctive Datalog rules into Datalog. Program shifting is a polynomial compilation of disjunctive logic rules into Datalog rules that preserve soundness of CQ answering, and acts on the translation π(·) of the axioms into definite rules.

Example 4.1.

In Example 3.1, we know that assertions Report(work1) and JournalPaper(work1) hold (because of assertions (a10), (a9)). Moreover, by (t2), we know that Thesis(work1) does not hold. Using this information, along with (t3), we can derive Paper(work1). This derivation is deterministic and can be captured by Datalog rules. To make this reasoning explicit, we introduce a fresh atom Thesis that intuitively represents the complement of Thesis, and add the following axioms to Kex:

(8)JournalPaperThesisReportThesisPaper
These rules can be used to derive Paper(work1).

Program shifting is formally defined as follows.

Definition 4.1

Definition 4.1([85], Def. 4.3).

Let r be a normalized disjunctive Datalog rule. For each predicate P in r, let P be a fresh predicate of the same arity. The shifting of r, denoted shift(r), is the following set of rules:

  • if r is of the form

    (9)β1βn
    then
    (10)shift(r)={r}{β1βi1βi+1βnβ¯i1in}

  • if r is of the form

    (11)β1βnγ1γm
    then shift(r) consists of the following rules:
    (12)β1βnγ¯1γ¯m(13)β1βiγ¯1γ¯j1γ¯j+1γ¯mγjfor 1jm(14)β1βi1βi+1βnγ¯1γ¯mβ¯ifor 1in

This can be generalized to sets of rules Σ as follows:
(15)shift(Σ)=rΣshift(r)

We apply this technique to our ALCHOIQ KB in order to reduce ourselves to a Horn KB. This procedure guarantees to produce a polynomial approximation of the input KB which is sound (but not necessarily complete) w.r.t. CQ answering. For r a disjunctive Datalog rule with n atoms in the body and m atoms in the head, shift(r) contains n+m+1 rules.

Theorem 4.1.

Let K=O,A be the ALCHOIQ restriction of the KB K=O,A. Moreover, let K=shift(O),A. Then cert(q,K)cert(q,K).

Proof.

See the Appendix.  □

4.3.Approximation to RSA

In this section we provide a description of an algorithm to approximate the Horn-ALCHOIQ KB K obtained in the previous step into an RSA KB K such that cert(q,K)cert(q,K) for any KB q. Given a Horn-ALCHOIQ KB K, checking if K is RSA consists of the following steps (see Def. 2.3):

  • 1. checking whether GK is an oriented forest;

  • 2. checking whether K is equality safe.

We first consider step 1. If GK is not an oriented forest, then its underlying undirected graph has a cycle. In order to make GK an oriented forest we want to detect these cycles, break them and propagate the changes back to K.

Cycles can be broken by removing nodes from GK. Nodes in GK are of the form uR,BA, paired with a corresponding existential axiom AR.BK of type (T5). The action of deleting a node from the graph can be propagated back to K by removing the corresponding (T5) axiom. Due to monotonicity of FOL, deleting axioms from K produces a lower bound approximation of K w.r.t. CQ answering.

Lemma 4.1.

Let K=O,A be a Horn-ALCHOIQ KB, GK be its dependency graph as defined in Def. 2.3 and uR,BA a node in GK. The dependency graph GK corresponding to K=O{AR.B},A does not contain uR,BA.

Proof.

This is proven by observing that, by definition of dependency graph, constant uR,BA can solely be introduced by the corresponding axiom AR.B, and hence, removing the axiom from O will remove the node from GK.  □

Using the Datalog reasoner, we compute MRSA from the program PRSA, obtained from K, and retrieve all instances of role E to build GK. Finally, we use the modified DFS visit of the graph shown in Algorithm 1 to detect any cycle in GK. During the visit, the algorithm determines a representative node for each cycle, which will be the node selected to be removed. In order to keep the visit as efficient as possible we determine these nodes eagerly, by selecting the last processed node when a cycle is detected. Let D be this set of nodes, then for every uR,BAD we remove the corresponding axiom AR.B in K. Note that D is, in general, not unique and different such sets might lead to different lower bounds.

Algorithm 1:

Cycle detection in GK

Cycle detection in GK

Next, we need to deal with equality safety (step 2). According to the definition of RSA, the following steps can be performed to ensure this property:

  • i. delete any (T4) axiom that involves a role S such that there exists wt (with w and t distinct) and R(t,uR,BA) in MRSA and RInv(S);

  • ii. if there is a pair of atoms R(a,uR,BA),S(uR,BA,a) in MRSA with aNI and a role T such that both RRT and SRInv(T) hold, then remove some axiom of type (R2) to break the derivation chain that deduces either RRT or SRInv(T).

Again, by removing some selected axioms we are able to force the input Horn-ALCHOIQ ontology to satisfy RSA additional constraints. In the following, we summarize steps 1–2 described above with the function lower(·) from KBs to KBs.

Theorem 4.2.

Let K be a SROIQ KB, and K its syntactic restriction to ALCHOIQ. Then, we have that cert(q,lower(shift(K)))cert(q,K).

Proof.

By Section 4.1 and Theorem 4.1 we know that

(16)cert(q,shift(K))cert(q,K)cert(q,K)
Moreover, we can observe that lower(·) only removes axioms from the input ontology; by monotonicity of FOL we have that cert(q,lower(shift(K)))cert(q,shift(K))cert(q,K).  □

As mentioned in Section 2, PAGOdA uses a similar approach to compute a lower bound by approximating the input ontology first to disjunctive Datalog and then to Datalog; this is done by discarding any axiom that is not in the language, while introducing some additional heuristics to handle specifically disjunctive and existential axioms.

Note that, in general, the lower bound resulting from the algorithm proposed here is incomparable with the one produced by PAGOdA.

The next example shows a scenario in which the lower bound computed by our algorithm is tighter than the one produced by PAGOdA. On the other hand, the RSA language fully captures OWL 2 RL (used internally by PAGOdA) only when not considering property chain axioms.

Both approximation techniques will be later combined into ACQuA.

Example 4.2.

Consider our running Example 3.1 and Kex=shift(Kex). Then

  • presentedAt is unsafe because of axioms (t6), (t7);

  • accepts is unsafe because of axioms (t8), (t9) and (r2);

whereas all other roles are safe.

Now, let ui, for 1i4 be unique, fresh constants, and PRSAex be the logic program (according to Def. 2.3), corresponding to Kex. In particular

(17)Journal(x)published(x,u1)PE(x,u1)Paper(u1)(18)Researcher(x)writes(x,u2)PE(x,u2)Paper(u2)(19)Paper(x)presentedAt(x,u3)PE(x,u3)Conference(u3)U(u3)(20)Conference(x)accepted(x,u4)PE(x,u4)Paper(u4)U(u4)
is the translation (according to Def. 2.3) of (t4), (t5), (t6), and (t8). Finally, MRSAex is the LHM of PRSAex.

The dependency graph GKex is shown in Fig. 4. GKex is not an oriented forest and as such we need to detect a set of nodes whose removal will turn the graph in Fig. 4 into a tree. Let’s assume Algorithm 1 returns the set {u4} to be removed from GKex. We propagate this change to Kex by removing axiom (t8). Kex was already equality safe. We denote the KB resulting from this process with Kex=lower(Kex).

Fig. 4.

Graphical representation of GKex.

Graphical representation of GKex.

If we consider the query (q1), then:

(21)cert(q1,Kex)={journal1,journal2,journal3}.
It can be verified that the lower bound computed by PAGOdA is not as tight and results in the set of answers containing only journal1.

5.Upper bound computation

We will now look at the problem of approximating a generic input KB K to a KB K from above, such that answering an input query over the approximated KB will return an upper bound to the answers. More formally, given an input KB K, we want to find a KB K s.t. cert(q,K)cert(q,K) for any CQ q. We initially consider ALCHOIQ+ as the source ontology language, not taking property chain axioms (T4) into account, and approximate the ontology to RSA+. Some additional comments on how to handle axioms (T4) will also be provided.

We adopt a similar approach to the one used in the lower bound computation and divide the procedure in steps. Given an ALCHOIQ+ KB, we proceed as follows

  • 1. replace any occurrence of ⊥ in the knowledge base with a fresh nullary predicate f with no special meaning;

  • 2. approximate disjunctive rules by removing all but one disjunct from the head of the rule. For each rule, the selected disjunct is chosen deterministically using an efficient choice function;

  • 3. enforce the constraints that define the RSA ontology language on the Horn-ALCHOIQ+ KB obtained in the previous step.

5.1.⊥ substitution

As described above, ACQuA performs a preliminary satisfiability check on the input KB; in spite of this, while strengthening the KB, we might cause the KB to become unsatisfiable.

In order to provide a meaningful upper bound even in cases where the approximation leads to an unsatisfiable KB, we adopt an approach initially proposed in PAGOdA. The idea is to substitute every occurrence of ⊥ with a fresh nullary predicate f with no predefined meaning; by doing so we avoid the derivation of the entire Herbrand base, ignoring the fact that the final KB approximation might be unsatisfiable. Note that, despite the fact that ⊥ is stripped of its built-in semantics in FOL, weakening the KB, it can be shown (see [85, Lemma 5.4,Theorem 5.5]) that we can still compute a meaningful upper bound for any input query.

This step has been included purely for theoretical purposes. RDFox, used in the implementation of the approximation algorithm, will not explicitly check for satisfiability during query answering, making it possible to consider correct the answers to a query even when the KB is unsatisfiable.

5.2.Approximation of disjunctive rules

According to Table 1, axioms of type (T1) and (T4) can introduce disjunction in the head of rules. This usually results in non-determinism in the answering process and a corresponding jump in computational complexity. In order to rewrite these axioms and avoid the introduction of this operator, we borrow a technique used in a similar fashion in PAGOdA. The approach consists in replacing any disjunction in the head of a rule with one of the disjuncts. It is easy to see that this strengthens the KB and eliminates any non-determinism introduced by the disjunction. The surviving disjunct is chosen deterministically using an efficient choice function; the idea is to analyse the dependency graph of the KB and choose a disjunct which does not eventually lead to a contradiction. To this end, a standard dependency graph of the KB is built and disjuncts are ordered according to their distance from f; The ordering is used to select those disjuncts that are less likely to lead to a contradiction, by picking (one of) the furthest from f. For more details on the definition of a choice function, see [85, Section 8.2].

Given a choice function ch that returns a concept name out of an input set, we define the process of eliminating disjunction from the head of a rule as follows

Definition 5.1.

Let δ be a function from axioms to axioms eliminating disjunction from the head of any axiom of type (T1) and (T4):

(22)δ(α)=i=1nAich({Bj1jm})if αi=1nAij=1mBjA1R.Bif αAmR.Bαotherwise
The definition of δ can be trivially extended to sets of axioms and KBs.

Example 5.1.

Consider axioms (t3) from our running example. Let ch be a choice function, such that

(23)ch({Paper,Thesis})=Paper
Then δ(t3) is
(t3’)ReportPaper

5.3.From Horn-ALCHOIQ+ to RSA+

The final step of the approximation process consists in enforcing the additional constraints that the RSA language introduces on top of Horn-ALCHOIQ. We apply these constraints on top of the KB obtained in the previous step, which is Horn-ALCHOIQ+, obtaining an RSA+ KB. We will later prove that the algorithm for the combined approach for RSA applied to an RSA+ ontology is complete w.r.t. CQ answering.

Given K a Horn-ALCHOIQ+ KB and GK its dependency graph as defined in Def. 2.3, checking if K is RSA+ consists of:

  • 1. checking whether GK is an oriented forest;

  • 2. checking whether K is equality safe.

In order to ensure equality safety we proceed similarly to the lower bound case. For any pair of atoms wt,R(t,uR,BA)MRSA and role S s.t. RInv(S), if S occurs in an axiom αC1S.D of type (T4), we convert α into the axiom CS.D. It is easy to see that, for any C,DNC and S role, {CS.D}C1S.D and hence the rewriting is a strengthening of the KB.

On the other hand, for each pair of atoms R(a,uR,BA),S(uR,BA,a)MRSA, with aNI and role T such that RRT and SRInv(T), we know that term uR,BA was introduced by an axiom AR.B of type (T5). In order to satisfy the constraint, we mark this axiom for constant Skolemization, meaning that when translated into a logic rule this axiom will be translated into A(x)R(x,c)B(c) for some unique fresh constant c.1414 Moreover, we assume to have a boolean function marked(α) over axioms that returns true if α is a marked axiom.

Finally, we reduce GK to an oriented forest. We proceed similarly to the lower bound computation described in Section 4.3. In fact, we can reuse Algorithm 1 to gather a possible set of nodes D, whose removal would render the dependency graph an oriented forest. As explained before, each node uR,BA uniquely identifies an axiom AR.B of type (T5) in the input KB. In order to break the cycles while strengthening the KB we mark the axioms in D for constant Skolemization.

These steps can be summarized in the definition of δ:

Definition 5.2.

We define δ as a function from axioms to sets of axioms.

(24)δ(α)={CS.Df}if αC1S.D and R unsafe s.t. RInv(S) andwt,R(t,uR,BA)MRSA with w,t distinct{AR.{bR,BA},{bR,BA}B}if αAR.B and marked(α){α}otherwise
where bR,BA is a fresh constant, unique to axiom AR.B.

Finally, given K=O,A, we define upper(K)=αOδ(α),A.

Theorem 5.1.

Let K be a satisfiable ALCHOIQ+ KB and K=upper(δ(K)). Moreover, let q(x)=yφ(x,y) be a CQ. Then,

  • (i) K is RSA+,

  • (ii) cert(q,K)cert(q,K),

  • (iii) if xcert(q,K) then PK,qAns(x).

Proof.

See the Appendix.  □

Example 5.2.

Consider, again, our running example (Example 3.1) and Kex=δ(Kex). Let PRSAex be its translation into logic rules (according to Def. 2.3) and MRSAex its LHM, as in Example 4.2. We know that the dependency graph GKex (shown in Fig. 4) is not an oriented forest. Let’s assume, again, that Algorithm 1 returns {u4}. We mark axiom (t8) for constant Skolemization by δ, instead of the standard Skolemization that would be applied by Def. 2.5 (accepts is unsafe). We denote the KB resulting from this process with Kex=upper(Kex).

If we consider the query (q2), then:

(25)cert(q2,Kex)=.
It can be verified that the upper bound computed by PAGOdA is not as tight and results in the following set of answers
(26){journal2,journal3,journal3,journal2}

5.4.Property chain axioms

Our tests show that, general property chain axioms (axioms of type (R4) in Table 1) are quite uncommon in practice. Transitive property axioms, on the other hand, are a specialization of (R4) that can be easily found in common ontologies. While we ignored the presence of these axioms so far, it can be shown that completeness is still guaranteed when including them in the language [14, Theorem 2,Proposition 1]. Intuitively, due to monotonicity of FOL, including more axioms in the computation of the canonical model will lead to a strengthening of the KB. Furthermore, the computational complexity for the computation of the canonical model is still bound by the translation of the problem into Datalog, for which new heuristics have being recently proposed to efficiently handle transitive closure of roles [38]. Note that, in this case, we are not modifying the filtration step, which will then only be able to detect a fraction of the spurious answers, effectively computing an upper bound of the certain answers.

6.Design and architecture

We proposed a new framework to compute CQ answering over unrestricted OWL 2 ontologies by using answer bounds and further refinement steps. The approach has been implemented in a system called ACQuA [42], which, as discussed in the previous sections, offloads different steps in the computation to a selection of underlying systems used as black boxes, i.e., RSAComb, PAGOdA and HermiT.

ACQuA is inspired by the “pay-as-you-go” philosophy that drove the development of PAGOdA and as such shares similarities and capabilities with the latter tool. The idea is to take different steps depending on how the input KB is classified. The input KB needs to go through a consistency check and normalization procedure first. If the normalized KB is inside one of the two ontology languages for which PAGOdA provides full support (i.e., OWL 2 RL and ELHOr), we use the PAGOdA lower bound algorithm to compute the answers to the query. This check is purely syntactic over the normalized ontology and can be performed by leveraging the OWLAPI [35] interface for OWL 2 profile checking. If the first check fails (i.e., the ontology is not in any of the aforementioned ontology languages), we check whether the ontology is in RSA using RSAComb. If the input ontology is RSA we use the RSAComb algorithm directly (described in Section 2.3.1) and collect the full set of answers to the query. If none of the tractable services for CQ answering are able to capture the KB, we use them to compute lower and upper bound approximations, taking the union of the lower bounds and the intersection of the upper bounds. If the combined bounds match, we have computed a sound and complete set of answers for the input query. If, however, this is not the case, we use PAGOdA’s algorithm to compute a subset of the input KB relevant to answer the query, and fall back to HermiT to filter any spurious answers from the gap between the bounds. A summary of these steps was provided in Section 3, along with a visual representation in Fig. 2.

In this section we will describe some design and implementation details that led to the development of ACQuA. In particular, we will focus our attention on RSAComb, a novel implementation of the RSA combined approach for CQ answering, and how the tool can be used to compute lower and upper bounds to the answers of an input query.

6.1.RSAComb

RSAComb [41] is an optimized implementation of the combined approach for CQ answering in RSA. We streamlined and reorganized the algorithm to make the different steps either ontology or query independent. On top of that we designed and implemented an API to introduce approximation capabilities in the system; RSAComb is able to take an unrestricted ontology as input and potentially apply an approximation algorithm (targeting RSA or RSA+) before computing the answers to a query. The system ships with reference implementations of the algorithms for the computation of answer bounds introduced in Sections 45.

The system is written in Scala and uses the OWLAPI [35] to interface with the input ontology and manipulate OWL 2 axioms. RDFox is used as an underlying Datalog reasoner; RSAComb has been designed to maximize the amount of computation to be offloaded to RDFox, by redefining problems in terms of queries over a materialized RDF store.

RDFox is used as a black box, and RSAComb can be adapted to use any Datalog reasoner with support for stratified negation and Skolemization. Nonetheless, the use of RDFox allowed us to introduce some optimizations based on particular features provided by the tool.

These are:

  • a SKOLEM operator,1515 which provides a way to uniquely associate a sequence of terms with a fresh term;

  • support for named graphs to isolate and cache partial computation;

  • support for “TBox reasoning” in order to reason directly on the structure of an ontology even when outside the supported OWL fragment.

We designed and built RSAComb around these general principles:

Modularity

The code should be modular and different steps in the algorithm should be as independent of each other as possible. It should be easy to reimplement (or enhance) an intermediate step of the algorithm as long as the signature and the interface with the system as a whole remain unaltered. We achieved this by an extensive use of Scala traits, building a collection of interfaces that describe the behaviour of the different actors that take part in the execution of the combined approach for RSA. As explained in the following sections, the integration with RDFox was also key to providing a good level of modularity to the systems.

Scalability

The system has to be able to scale efficiently even for large amounts of data. Partial results are computed when needed and reused whenever possible. A more detailed analysis on the performance and scalability of the system is provided in Section 8.

Integration

It should be equally possible to use the system as a self-contained application or integrate it in another system. As such, our software presents a simple but effective command line interface alongside a well-structured set of classes exposing all the necessary tools to work with RSA ontologies, while hiding unnecessary implementation details. The different steps can also be disabled for user convenience.

We will first provide a description of RSAComb as an implementation of the RSA combined approach and then go into details on how lower and upper bound algorithms are implemented in the system.

6.1.1.Overview

Fig. 5.

Workflow of the RSAComb system.

Workflow of the RSAComb system.

Figure 5 summarizes the workflow of RSAComb:

  • (i) the approximation steps take an unrestricted OWL 2 KB as input and approximate it to a target language handled by the RSA combined approach;

  • (ii) the canonical model for the resulting RSA KB is computed by materializing the data against a logic program derived from the input ontology;

  • (iii) a filtering program is derived from the input query and is combined with the canonical model to produce the set of certain answers to the input query over the approximated knowledge base.

The process of importing the input ontology (TBox, RBox) into the system is performed using the OWLAPI. Since importing large amounts of data (ABox) into the system might be expensive, data files are read and data is loaded on demand and reused whenever possible to maximize performance.

As mentioned above, two approximation algorithms ship with the system. The first approximation algorithm is an implementation of the algorithm presented in Section 4; it targets the RSA ontology language and maintains soundness w.r.t. CQ answering, i.e., answers to a CQ are a lower bound to the answers to the query over the original KB. A copy of the ontology, translated into Datalog according to Def. 2.3, is imported into RDFox along with the data and materialized by the reasoner. The dependency graph and equality safety checks (see Definition 2.3) are implemented as queries over the RDF store exposed by RDFox; the original knowledge base is altered accordingly. The second approximation is an implementation of the algorithm introduced in Section 5; it targets RSA+ and maintains completeness w.r.t. CQ answering, i.e., answers to a CQ are an upper bound to the answers to the query over the original knowledge base.

The canonical model is computed for the knowledge base in Step (ii); this is done by converting each axiom in the KB into a logic rule according to Def. 2.5 and uploading it into RDFox. Note that the translation from axioms into logic rules is different from the one in Step (i), hence the need to reload them into RDFox. The data, on the other hand, is safely reused. Finally, the potentially spurious answers to the input query introduced during the canonical model computation are filtered out in Step (iii). It is worth noting that, in this scenario, steps (i),(ii) are query independent, while step (iii) is ontology independent. As such, when multiple queries are submitted over the same KB, steps (i-ii) are performed “on-demand” and only once, while the third step is performed for each input query.

6.1.2.Canonical model computation

Fig. 6.

RSAComb: canonical model computation.

RSAComb: canonical model computation.

The computation of the canonical model (Figure 6) involves the conversion of the input RSA ontology into logic rules as described in Def. 2.5, and where function symbols are simulated using RDFox’s built-in Skolemization feature.

Example 6.1.

A Skolemized rule derived from an existential axiom (T5)

(27)A(x)R(x,fR,BA(x))B(fR,BA(x))
can be turned into the following RDFox – compatible rule sw-14-sw233382-g008.jpg

where the built-in operator SKOLEM binds ?Y to a unique value generate from the string “A,R,B” and term ?X.

The system performs the conversion and then offloads the materialization of the rules, combined with the input data, to RDFox.

Since the canonical model is query independent, this process can be performed once and the result cached and reused for every subsequent query over the same input ontology. We achieve this using RDFox’s support for RDF named graphs, which enables us to perform operations on specific “named” subsets of the data. Further operations on the graph operate and produce additional data in different named graphs, leaving the materialized canonical model intact.

Axiomatization ofand ≈ RDFox has built-in support for ⊤ (owl:Thing) and equality (owl:sameAs), so that ⊤ automatically subsumes any new class introduced within an RDF triple, and equality between terms is always consistent with its semantics.

In both cases we are not able to use these features directly: in the case of top axiomatization, we import axioms as Datalog rules, which are not taken into consideration when RDFox derives new ⊤ subsumptions;1616 in the case of equality axiomatization, the feature cannot be enabled along other features like aggregates and negation-as-failure (with the latter extensively used in our system).

To work around this, we introduce the axiomatization for both predicates explicitly. For every concept name CNC and for every role name RNR in the input ontology, we add the following rules to RDFox:

sw-14-sw233382-g009.jpg

This gives us the correct semantics for owl:Thing.

Similar rules are introduced to axiomatize equality. We make the role reflexive, symmetric and transitive:

sw-14-sw233382-g010.jpg

and introduce substitution rules to complete the axiomatization. For every concept name CNC and for every role name RNR in the input ontology, we add:

sw-14-sw233382-g011.jpg

The notIn and named predicates Our work also includes a few clarifications on theoretical definitions and their implementation. In the canonical model computation [22], the notIn predicate is introduced to simulate the semantics of set membership and in particular the meaning of notIn[a, b] is “a is not in set b”. During the generation of the canonical model program performed by RSAComb, we have complete knowledge of any set that might be used in a notIn atom. For each such set S, and for each element aS, we introduce the fact in[a,S] in the canonical model. We then replace any occurrence of notIn[?X,?Y] in the original program EK with NOT in[?X, ?Y], where NOT is the operator for negation-as-failure in RDFox. This is possible because we know that EK is stratified; moreover the negated predicate introduced in these rules is fully instantiated at program generation and does not appear in the head of any rules, maintaining the stratified structure of the program.

We generate the instances of the predicate NI, representing the set of non-anonymous terms in the materialized canonical model, with the following rule:

sw-14-sw233382-g012.jpg

where rsa:named is a predicate representing the set of constants in the original KB.

A final improvement has been made to the computation of the cycle function used during the generation of the canonical model program performed by RSAComb. The original definition involved a search over all possible triples (A,R,B) where A,BNC and RNR in the original ontology. We realized that traversing the whole space would significantly slow down the computation, and is not necessary; we instead restrict our search over all (A,R,B) triples that appear in a (T5) axiom AR.B in the original normalized ontology.

6.1.3.Filtering program and answer computation

Fig. 7.

RSAComb: answer filtering.

RSAComb: answer filtering.

As depicted in Fig. 7, answer filtration involves the computation of the filtering program from the input query, the filtering of the materialized canonical model and the final process of gathering the answers.

RSAComb performs the translation of the query into a set of logic rules. This step was modified w.r.t. the original definition [22] to be completely ontology independent by moving the generation of rsa:named instances to the canonical model computation step. Furthermore, we redesigned the filtering step to restrict ourselves to use only unary and binary predicates and, as a result, keep the filtering somewhat closer to the realm of description logics (and to the language supported by RDFox). Filtering rules are then greatly simplified by making extensive use of the Skolemization operator provided by RDFox, hence avoiding some expensive joins that would result from a standard reification process.

Example 6.2.

Let q(x)=ψ(x,y) be a CQ with x=x1,,xm, y=y1,,yn. Rule (3c) of the filtering program (see Table 3) computes the transitive closure of the predicate id, keeping track of identity between anonymous terms w.r.t. a specific match for the input query.

(28)id(x,y,u,v),id(x,y,v,w)id(x,y,u,w)
A standard technique to reduce the arity of predicates is reification. Provided we have access to a function KEY to compute a new term that uniquely identifies a tuple of terms, we can reify any n-ary atom into a set of n atoms of arity 2. E.g., an atom P(x,y,z) becomes P1(k,x),P2(k,y),P3(k,z), where k=KEY(x,y,z) and Pn, for 1narity(P), are fresh predicates of arity 2. Rule (3c) can be reified as:
(29)id1(k,x1),,idm+n(k,yn),idm+n+1(k,u),idm+n+2(k,v),id1(j,x1),,idm+n(j,yn),idm+n+1(j,v),idm+n+2(j,w),l:=KEY(x,y,u,w)id1(l,x1),,idm+n(l,yn),idm+n+1(l,v),idm+n+2(l,w)
The problem with this approach is that it increases the number of joins to be performed to match the body of the rule.

Using the SKOLEM functionality in RDFox, we are able to reduce the arity of a predicate P (id in this example) without having to introduce arity(P) fresh predicates (see (30)). The SKOLEM predicate associates a list of terms with a unique blank node; the list of terms and the variable that will be bound to the blank node are passed to the SKOLEM predicate as a single list of arguments. To this end, an atom id(x,y,u,v) in the original rule becomes an atom id(k,j) of arity 2 where SKOLEM(x,y,k) and SKOLEM(x,y,u,v,j) hold, and k and j are bound to two blank nodes uniquely associated with the sequences of terms x,y and x,y,u,v, respectively. Joins over multiple terms (id joining over (x,y) in (28)) can now be rewritten into simpler joins (id joining over a single term k):1717

(30)id(k,j),SKOLEM(x,y,u,v,j),id(k,l),SKOLEM(x,y,v,w,l),SKOLEM(x,y,u,w,t)id(k,t)

The complete rewriting of the filtering program is provided in Table 5. According to the documentation1818 for the SKOLEM operator in RDFox, it can be easily shown that the rewriting is not changing the semantics of the rules, but instead packs and unpacks subsets of variables in order to make rule matching more efficient.

Table 5

Improved rules for filtering step for the RSA combined approach

(1)ψ(x,y),SKOLEM(x,y,s)QM(s)
(2)rsa:named instances computed in the canonical model step.
(3a)QM(s),SKOLEM(x,y,s),notNI(yi),SKOLEM(i,i,k)id(s,k) for each 1i|y|
(3b)id(s,k1),SKOLEM(u,v,k1),SKOLEM(v,u,k2)id(s,k2)
(3c)id(s,k1),SKOLEM(v,u,k1),id(s,k2),SKOLEM(u,w,k2),SKOLEM(v,w,k3)id(s,k3)
(4a)for all R(a,yi), S(b,yj) in q with yi,yjy
Rf(a,yi),Sf(b,yj),SKOLEM(i,j,k),id(s,k),SKOLEM(x,y,s),notabfk(s)
(4b)for all R(a,yi), S(yj,b) in q with yi,yjy
Rf(a,yi),Sb(yj,b),SKOLEM(i,j,k),id(s,k),SKOLEM(x,y,s),notabfk(s)
(4c)for all R(yi,a), S(yj,b) in q with yi,yjy
Rb(yi,a),Sb(yj,b),SKOLEM(i,j,k),id(s,k),SKOLEM(x,y,s),notabfk(s)
for all R(yi,yj), S(ym,yn) in q with yi,yj,ym,yny
(5a)Rf(yi,yj),Sf(ym,yn),SKOLEM(j,n,k1),id(s,k1),SKOLEM(x,y,s),
yiym,notNI(yi),SKOLEM(i,m,k2)id(s,k2)
(5b)Rf(yi,yj),Sb(ym,yn),SKOLEM(j,m,k1),id(s,k1),SKOLEM(x,y,s),
yiyn,notNI(yi),SKOLEM(i,n,k2)id(s,k2)
(5c)Rb(yi,yj),Sb(ym,yn),SKOLEM(i,m,k1),id(s,k1),SKOLEM(x,y,s),
yjyn,notNI(yj),SKOLEM(j,n,k2)id(s,k2)
(6)for each R(yi,yj) in q with yi,yjy and {f,b}
R(yi,yj),id(s,k1),id(s,k1),SKOLEM(i,v,k1),SKOLEM(x,y,s),
id(s,k2),SKOLEM(j,w,k2),SKOLEM(v,u,k3)AQ(s,k3)
for each {f,b}
(7a)AQ(s,k)TQ(s,k)
(7b)AQ(s,k1),SKOLEM(u,v,k1),TQ(s,k2),SKOLEM(v,w,k2),SKOLEM(u,w,k3)TQ(s,k3)
(8a)QM(s),SKOLEM(x,y,s),notnamed(x)sp(s) for each xx
(8b)fk(s)sp(s)
(8c)TQ(s,k),SKOLEM(v,v,k)sp(s) for each {f,b}
(9)QM(s),notsp(s),SKOLEM(x,y,s),SKOLEM(x,k)Ans(k)

The filtering program is, then, loaded into RDFox and the materialization is updated taking into account the newly introduced rules. The triples produced by this materialization update are stored in a separate named graph to keep the product of filtration separate from the canonical model. This is possible because the signature of the atoms in the head of rules introduced by the filtering program is separate from the signature of the canonical model. When processing a new query, the only step we need to take is to drop the named graph associated with the filtration from the previous query, leaving unaltered all other triples. Better yet, here we have the possibility to execute queries in parallel, each one associated with a separate filtering program and hence storing their derivations in different named graphs. The materialization update for each of the queries is isolated and does not interfere with the other processes.

At this point, the task of gathering the answers to the query over the input KB is reduced to querying a materialized named graph for the atoms representing the certain answers.

Example 6.3.

Given a query q(x)=φ(x,y), with x=x1,x2,x3, we can retrieve the answers to q with the following query sw-14-sw233382-g014.jpg

where we first collect all instances ?K of the class rsa:Ans, and then we unpack them at line ?? using the custom RDFox syntax for the SKOLEM operator, to retrieve the actual answers. When answering BCQs, we only need to check for an rsa:Ans witness, i.e., an instance of rsa:Ans in the RDF store: sw-14-sw233382-g015.jpg

6.2.Lower bound approximation to RSA

As described in Section 4, we propose a novel algorithm for approximating an unrestricted input KB to RSA. The procedure is composed of 3 main steps:

  • 1. Approximation to ALCHOIQ via axiom filtering;

  • 2. Approximation to Horn-ALCHOIQ via program shifting;

  • 3. Approximation to RSA by reducing the ontology dependency graph to an oriented forest and ensuring equality safety properties.

The first two steps are entirely carried out by RSAComb in a straightforward way. The knowledge base is first filtered by axiom type and then program shifting (Def. 4.1) is applied to all relevant axioms. The last step is designed to partially offload the task to RDFox; this involves:
  • building and reasoning over a custom dependency graph derived from the materialization of the input data over a Horn-ALCHOIQ KB;

  • reasoning over the knowledge base itself, and in particular performing some RBox reasoning task.

We first translate the axioms in the knowledge base according to Definition 2.3, and import them, along with the data, into RDFox. The imported data and its materialization contain all instances of the atom E, used to build the dependency graph for the input ontology. After retrieving all instances of E, querying the RDFox triple store with the following query

sw-14-sw233382-g016.jpg

RSAComb builds the dependency graph for the input KB. Using Algorithm 1 we detect and break cycles by iteratively removing nodes. The existential axioms corresponding to the nodes returned by the visit are removed from the input ontology.

For the equality safety check we need to reason over the ontology itself and in particular perform some reasoning over its RBox. Regardless of the support offered by the Datalog reasoner for this task, axioms in a knowledge base can be encoded as RDF triples.1919

RDFox supports importing OWL 2 axioms and the conversion into RDF triples is performed automatically. RBox reasoning (Listing 1) is then achieved by importing the following rules into the RDF store.

Listing 1.

Rules for role subsumption reasoning

Rules for role subsumption reasoning

These encode reflexivity and transitivity of sub-role axioms (R2) (lines 79), taking into account inverse (lines 35) and equivalent roles (line 1), as well.

Once both the data and the axioms have been imported and materialized according to their respective rules, the equality safety condition (i) of Definition 2.3 can be formulated as a query (see Listing 2).

For each pair of atoms wt, with w and t distinct, and R(t,uR,BA) (lines 24) in MRSA and each role S s.t. RInv(S) (lines 56), we query for the tuple A,S,B such that A1S.B is part of the input KB (lines 711). For each triple A,S,B returned by the query we can remove the corresponding axiom (T4) from the input ontology.

Listing 2.

Condition 1 of equality safety in RSA definition

Condition 1 of equality safety in RSA definition

Similarly, condition (ii) can be formulated as a query (see Listing 3).

Listing 3.

Condition 2 of equality safety in RSA definition

Condition 2 of equality safety in RSA definition

For each pair of atoms R(a,uR,BA),S(uR,BA,a) in MRSA with aNI (lines 25), we detect roles R,S such that there exists a role T for which RRT (lines 68) and SRInv(T) (lines 910). Note that, when detecting RRT we “isolate” the first step of the subPropertyOf chain (line 6) and query for that couple of roles R,P. In this case the returned couple R,P, identifies an axiom of type (T2) whose removal will break a chain of sub-properties from R to T, making the knowledge base equality safe.

6.3.Upper bound approximation to RSA

The approximation algorithm proposed in Section 5 is implemented in a similar way. Again, the procedure is divided into the following steps:

  • 1. rewriting of ⊥ into a new nullary predicate f with no predefined meaning,

  • 2. rewriting of disjunctive rules to eliminate disjunction, and

  • 3. approximation to RSA+.

As discussed before, the first step is not performed in practice. During the computation of the KB approximation and the upper bound set of answers, we simply ignore the satisfiability of the KB. Note that, even if ⊥ is derived during the process of materialization, RDFox will not derive the entire Herbrand base, to keep the operation as efficient as possible. We can use this to our advantage and still compute a meaningful upper bound approximation.

The rewriting of disjunctive rules is also straightforward, and is performed directly by RSAComb. The choice function is implemented as in PAGOdA, in order to avoid the derivation of ⊥ (see Section 5.2).

Finally, the third step involves the same framework introduced in the previous section for the lower bound computation, and in particular the construction of the dependency graph and role subsumption reasoning are performed in the same way. Both the enforcing of equality safety and the reduction of the dependency graph to a forest involve a rewriting of the KB according to Def. 5.2, and are implemented directly in RDFox.

Finally, RSAComb can be used to run the combined approach algorithm on the resulting RSA+ KB. According to Theorem 5.1 the answers produced by RSAComb are an upper bound to the answers to the query.

7.Related work

Conjunctive query answering over knowledge bases is one of the foundational problems when reasoning over ontologies. This, along with its corresponding decision problem (conjunctive query entailment, CQE) have been analysed both from the theoretical point of view (with extensive research on its computational complexity) and from the practical point of view, leading to a number of algorithms and their implementation in various reasoning systems. For a summary of the complexity results on decidability of CQE for some of the ontology languages mentioned in this work, see Table 6.

Table 6

Decidability of CQE for a selection of ontology languages

Ontology languageCombined complexityData complexity
ELHOrNP–complete [72]P–complete [69,72]
OWL 2 QLNP–complete [10,12]AC0 [10,12]
OWL 2 ELPSpace–complete [48,72]P–complete [48,72]
OWL 2 RLNP–complete [1,29]P–complete [1,29]
RSANP–complete [22]P–complete [22]
Horn-ALCHOIQExpTime–complete [13]P–complete [13]
SHOIQopenopen
SROIQopenopen

We will next provide an overview of the techniques proposed in the literature to perform CQ answering and in particular look at those tools that make use of bounds computation in order to drive the query execution. A closer comparison of ACQuA with these tools is also provided.

7.1.Query answering techniques

Support for CQ answering is offered natively by several existing reasoners. Some of them achieve this by ensuring sound and complete answers for a specific semantics over a certain family of ontology languages, while others limit the language in which the queries can be expressed. We will now give an overview of the several CQ answering techniques present in the literature.

7.2.Reduction to entailment checking

The first technique we are going to discuss is based on the reduction of CQ answering to entailment checking. Tableau–based DL reasoners like Pellet [71], HermiT [24], RacerPro [31] construct a finite structure that represents a model for the input KB and use blocking conditions to ensure the termination of the procedure. These reasoners usually target standard reasoning tasks and only offer limited support for CQ answering. Still, internalisable CQs can be rolled-up and included in the KB, effectively reducing CQ answering to entailment checking of a fresh concept entailed by the rolled-up query.

Pellet [71] provides support for CQ answering under ground semantics and supports CQ answering under certain answer semantics limited to tree-shaped queries (which can be internalized using the rolling-up technique).

HermiT [24] is a fully-fledged reasoner for OWL 2, based on the hypertableau calculus [59]; it does not provide a direct interface to answer CQs but reduction to entailment checking can be manually performed as a preliminary step.

RacerPro is a tableau–based system for the SHIQ DL language; it implements a technique for instance retrieval, called filter and refine [82] and is tailored towards KBs with large ABoxes. The idea behind this technique is to first determine obvious (non-)solutions to a concept description (filter) and subsequently perform an optimized instance check (using ABox locality properties) for the remaining individuals (refine). It supports a superset of CQs under ground semantics.

Another tableau-based reasoner, Konclude [77], has been recently adapted to perform CQ answering over expressive ontologies [75], using an absorption-based technique [74,76]. The assertional part of a KB is divided into small packets used to parallelize the model construction of the tableau algorithm. This parallelizable approach, along with the use of caching to avoid the need of synchronization mechanisms between workers, can be used to derive possible answers to a CQ. Candidate answers are then checked using entailment checking, where bindings for the answer variables are restricted to individuals appearing in the possible answers. According to [75], the approach works best when considering ground queries, while the presence of existential variables can require a substantial amount of additional computation.

Overall, the systems described in this section are not primarily designed for CQ answering under certain answer semantics and instead target other reasoning tasks. The technique of reducing CQ answering to entailment checking is supported for expressive ontology languages but may not scale as well as other approaches. Optimizations have been proposed to further limit performance issues; examples are query execution order, based on the input KB [45] and data summarization [17].

When considering the development of fully-fledged reasoners targetting OWL 2, such as HermiT and Konclude, improvements on these reasoners can translate into improvements for hybrid systems like ACQuA and PAGOdA, which directly use these tools as black boxes.

7.3.Materialisation-based reasoners

Materialization-based reasoners are also widely in use and implement the forward chaining algorithm on top of (some fragment of) Datalog. Materialization-based systems are often built on top of RDF management systems; i.e., data management systems based on the Resource Description Framework representing knowledge as statements in the form of triples.

Triple stores like Jena [52], Sesame [8] and Virtuoso [21] offer query answering capabilities over RDBMS and support the RDFS description language. OWLim [7] provides support for OWL 2 RL ontologies. A materialisation-based reasoner extensively used in this work is RDFox [60], an RDF store supporting arbitrary Datalog rules over unary and binary predicates. The nature of the tool allows for important optimizations, e.g., incremental updates, and parallel materialization, at the expense of a limited expressivity in the supported description logic language [5557,60]. RDFox covers most SPARQL 1.1 over an extension of Datalog. There are several other engines that support CQ answering over (extensions of) Datalog; among them, it is worth mentioning DLV [49], which provides support for CQ answering over an extension of disjunctive Datalog.

Although OWL 2 RL is expressive enough to cover a large portion of practical use cases, it lacks some common patterns like disjunctive knowledge or existentially quantified knowledge, that would potentially render the materialization process either non-deterministic or infinite. Typically, materialisation-based reasoners can still process ontologies outside OWL 2 RL, ignoring axioms that do not fall into the language. Answers to queries are still sound, but might not be complete, effectively providing a lower bound to the set of certain answers. This technique is used in the system PAGOdA [85] to effectively compute a sound lower bound to the set of certain answers to a CQ.

7.4.Ontology-mediated query rewriting

DLs are often used to model the domain of interest as collections of concepts and roles. In this sense, ontologies offer a great tool to build high-level semantics on top of some structured data (e.g., relational database).

OBDA directly applies this principle, creating a layer of abstraction on top of an existing data store; an ontology becomes an entry point for the user to access the underlying data via query answering. Another advantage of this approach is that it can rely on the underling data store (e.g., a RDBMS) to carry out the reasoning tasks. The OBDA framework [83] uses an ontology to rewrite an input query (i.e., expanding it by incorporating parts of the ontology). It then uses a set of mappings2020 to transform the rewritten query into a query over the underlying relational data sources. The process is called perfect reformulation [66] and ensures that the answers to the query over the dataset and the ontology are the same as the answers to the rewritten query over the dataset alone.

It is worth noting that, since the query addresses the data source(s) indirectly, any updates made to the source are immediately reflected into the system. This is in contrast with the materialisation-based approach, where updates in the source require the recomputation (or the update) of the materialized dataset.

The OBDA approach is based on ontologies that fall into the DL-Lite family of DL languages, and hence the OWL 2 QL profile, for which the rewriting of CQs into unions of first-order queries is guaranteed to exist [10]. Perfect reformulation is implemented in QuOnto [2] and further integrated into the MASTRO system [11]. Unfortunately, the query rewriting process can lead to an exponentially larger first-order query [10] and polynomial rewriting is guaranteed only for small fragments of OWL 2, such as OWL 2 QL. For a more in-depth analysis on the performance of the OBDA approach we refer the reader to the Optique project and their work with Equinor [37,44].

The query rewriting technique has been applied to EL and ELH [69], showing that UCQs can be rewritten into a Datalog query. The same result does not hold for EL+ and EL++. REQUIEM [64,65] implements a resolution-based query rewriting technique for ELHIO¬, a DL covering both DL-Lite and EL. The rewriting is based on the resolution calculus to saturate the set of rules in the ontology and subsequently filter out those containing functional terms. However, the introduction of inverse roles leads to a significant jump in complexity: CQE for EL and ELH is NP–complete, whereas it becomes ExpTime–complete for ELHIO¬. Depending on the language of the input ontology the rewriting can be a UCQ or a (linear) Datalog query.

We briefly mention the work done in Clipper [19,20] which implements a query rewriting technique for Horn-SHIQ. The rewriting differs from the ones mentioned above since Clipper modifies the dataset as well. A set of inference rules are used to saturate the input ontology and the data is materialized against the Datalog rules in the saturation. The query is rewritten against the subset of existentially quantified rules in the ontology and evaluated against the augmented dataset. The saturated ontology and the rewriting might be exponential in size w.r.t. the input ontology and query. Query rewriting has also been applied to linear Datalog± ontologies (see [61]).

A different approach involves the manipulation and rewriting of the input query [25,28]. The authors propose a decision procedure for CQE for SHIQ and SHOQ based on the rewriting of the query into a forest shape. By applying the rolling-up technique [36], the problem is reduced to testing the consistency of an extended KB.

7.4.1.Combined approaches

The combined approach is another widely known technique for computing a sound and complete set of answers to a CQ. In this scenario the dataset is first augmented by materializing entailed facts w.r.t. the ontology in order to build a model for the input knowledge base. This process is usually query-independent and performed in polynomial time. Spurious answers are then systematically identified by means of a filtration step or by rewriting the query [47,51] in order to derive the certain answers to the input query. An example is the combined approach for CQ answering over RSA ontologies (see Section 2.3.1), widely exploited in this work. The technique has been applied to several description logics in the EL family, such as the extension of ELH with ⊥ and range axioms [51] and ELHOr [73], as well as in the DL-Lite family, e.g., DL-Litehorn with number restriction [47] and DL-LiteR [50]. More recently the combined approach has been applied to Horn-ALCHOIQ [13], the ontology language underlying RSA.

In general these techniques are designed for a specific ontology language and do not support unrestricted OWL 2 ontology. On the other hand, as shown in this work and in PAGOdA, the combined approach can be easily used as an intermediate step in the computation.

7.4.2.Hybrid approaches

We will now look at tools that combine more than one technique described above to implement CQ answering.

Hydrowl [78] is a reasoner for CQ answering combining an OWL 2 RL reasoner, a query rewriting system and a fully-fledged OWL 2 reasoner. Hydrowl uses a repairing strategy [79] (limited to those ontologies for which a repairing exists) and query rewriting to answer an input query q. First a query base, i.e., a set of atomic queries that can be answered using the OWL 2 RL reasoner, is derived from the query. It is checked whether the query base “covers” the query, and in that case the OWL 2 RL reasoner is used to answer the query; otherwise the tool falls back to the fully-fledged reasoner. Further investigation on the computation of the query base [85] shows that the algorithm is not always able to automatically extract a set of atomic queries, thus compromising the correctness of the approach.

Absorption-based query entailment checking [74] (inspired by the absorption technique introduced by Steigmiller et al. [76]) also falls into the category of hybrid approaches. An input query is rewritten in order to make its entailment more efficiently detected by the model constructed using an extended version of the tableau algorithm. In this sense, the rewritten query is used to identify the individuals that are involved in the entailment of the query and, at the same time, to guide the construction of the model in the tableau algorithm. The technique is sound for CQE for expressive ontology languages, such as SHIQ and SHOQ.

PAGOdA [85] uses a hybrid technique to compute the answers to a query, mixing different approaches. The idea is to compute lower/upper bound approximations to the answers to a query by approximating the input ontology into a less expressive language and possibly provide a fallback (more expensive) algorithm to process the answers in the gap between the bounds. For a more detailed description of the approach see Section 2.2 and [85]. ACQuA builds on top of these techniques.

7.4.3.Ontology approximation

The idea of approximating an expressive language into less expressive (but more tractable) languages has been exploited before. This was first introduced by Selman and Kautz [70] and Del Val [81] in the context of logic theories (both propositional and FOL) and has been applied in the context of ontologies and CQ answering as well. Besides PAGOdA, some of the systems that use ontology approximation to explore and restrict the set of answers to a given CQ are SCREECH [34], TrOWL [80] and SHER [17].

The SCREECH system [34] is able to compute an (unsound or incomplete) approximation of the answers to a query under ground semantics. It achieves that by performing a query dependent (and possibly exponential) rewriting of the input SHIQ ontology to disjunctive Datalog first, and then further to Datalog. Compared to ACQuA, SCREECH can only handle CQ answering under ground semantics over SHIQ ontologies.

TrOWL [80] is a system providing CQ answering capabilities over OWL 2 DL. It uses a semantic approximation [63] technique to transform an OWL 2 DL ontology into OWL 2 QL for CQ answering and a syntactic approximation [67] from OWL 2 DL to OWL 2 EL for TBox reasoning. While being sound and complete for CQ answering, approximations steps in TrOWL are ontology and query dependent, making in harder to reuse partial results in the computation. Moreover, the semantic approximation requires the use of a fully-fledged reasoner to compute a KB approximation whose axioms are valid w.r.t. the input ontology.

The SHER [17] system is a tableau-based reasoner for SHIN which provides instance retrieval capabilities. The system uses a summarization technique to compute an upper bound to the answers to an instance query. Spurious answers are then filtered out by a following relaxation step [16,17]. Again, this system is sound and complete for instance CQ answering for ontologies within the SHIN DL language.

In addition, a way to approximate an OWL 2 ontology into an OWL 2 QL ontology maintaining completeness for instance queries is proposed as part of the filter and refine technique presented by Wandelt et al. [82]. The idea is to transform every axiom CD in an OWL 2 ontology into a stronger OWL 2 QL axiom CD such that C subsumes C and D subsumes D. The technique is, however, non-deterministic in nature and the approximation can sometimes lead to an unsatisfiable ontology.

Under the umbrella of approximate reasoning for CQ answering, the query extension technique [26,27] is of particular relevance. This algorithm aims at improving the bounds of the answers by extending the query with additional atoms obtained analysing the input ontology. The resulting query can then be used to restrict the bounds of subqueries of the initial query.

Finally, different notions of approximation for ontology-mediated queries over a selection of expressive languages like ALC and ALCI have been explored [32]. The authors aim at designing polynomial time approximations towards tractable languages like ELI or some restricted classes of TGDs “from below and from above” (lower and upper bounds) with respect to CQs (and other query formalisms as well).

8.Evaluation

We provide here an extensive evaluation over a range of benchmark ontologies. We start by looking at some performance results for RSAComb [41], our implementation of the combined approach for RSA, followed by a comparison of our system ACQuA [42] with PAGOdA.2121 Section 8.1 provides an in-depth description of the benchmarks used for the evaluation. Ontologies, data, queries, and scripts used to run tests and generate the graphs shown in this section are freely available online [43].

All experiments were performed on an Intel(R) Xeon(R) CPU E5-2640 v3 (2.60GHz) with 16 real cores, extended via hyper-threading to 32 virtual cores, 512 GB of RAM and running Fedora 33, kernel version 5.10.8-200.fc33.x86_64. We were able to make use of the multicore CPU and distribute the computation across cores, especially for intensive tasks offloaded to RDFox.

8.1.Benchmarks and tools

We use two different sets of benchmark ontologies:

  • the PAGOdA batch mimics the evaluation process originally performed for PAGOdA [85];

  • the Oxford Ontology Repository (OOR) batch is a subset of the Oxford Ontology Repository,2222 and it’s used to provide a broader evaluation on a wide range of ontology benchmarks.

The PAGOdA batch consists of a selection of ontologies and benchmark data that comes with the PAGOdA distribution.2323 These resources include ontology, data, and queries for:

  • LUBM and UOBM, standard benchmarks with a data generator (depending on a numerical parameter) and sample queries. When referring to a dataset generated for a particular parameter we will use LUBM(n) and UOBM(n) for some number n. PAGOdA provides an additional set of queries more challenging for the tool.

  • Reactome and Uniprot, realistic ontologies for which both data and relevant queries are provided. To test scalability, the datasets of these ontologies have been sampled in subsets of increasing size.

A summary of the statistics regarding each of these ontologies can be found in Table 7 where n is the parameter passed to the data generator for LUBM and UOBM.

Table 7

Benchmarks statistics, with LUBM/UOBM data generators depending on a parameter n

# Axioms# Facts# Queries
LUBM(n)93n×10535
UOBM(n)1862.6n×10520
Reactome5591.2×107130
Uniprot4421.2×108240

This batch aims at testing the system with a set of well–established benchmark queries. These queries have been defined to stress a system on a broad range of aspects of an answering routine, and vary a lot in terms of complexity and number of answers.

For the OOR batch we selected 126 ontology from the repository with non-empty ABoxes. A summary of the statistics of the ontologies in the repository can be found online.2424

Since the Oxford Ontology Repository does not provide any test queries, we generated, for each ontology, a set of sample queries by extracting atomic concept, atomic role and existential patterns from the structure of the ontology. In order to generate a suitable number of queries we used the following step:

  • 1. Import the ontology into RDFox as RDF triples.

  • 2. Query for a specific pattern in the ontology, e.g., sw-14-sw233382-g020.jpg to retrieve all existential axioms in the ontology.

  • 3. Convert those patterns into queries.

Using this method, we extracted 14135 concept atomic queries, 4434 role atomic queries and 3893 existential queries for a total of 22462 queries over 126 ontologies. Apart from the basic atomic patterns, we included existential queries of the form

sw-14-sw233382-g021.jpg

By doing so, we aimed for an empirical confirmation of our ability to produce the correct set of answers under certain answer semantics, for a set of queries that potentially provide different results when considering them under ground semantics.

While the generated queries have a limited number of atoms, the primary aim of this batch is to stress the system with a high number of queries per ontology; this allows us to draw conclusions on the behaviour of the system on a wider variety of test cases.

The collection of SPARQL queries and the scripts to generate them are part of our benchmark distribution [43].

8.2.PAGOdA batch

We now present the test result obtained using the first set of benchmarks. We first tested RSAComb as a standalone system, in order to evaluate its performance and scalability. Later we compare the performance of ACQuA against the original PAGOdA. This is particular usefully since we were able to draw a very close comparison between the two tools and improve upon the observations provided by PAGOdA [85]. This also helped us identify how and when ACQuA’s algorithm outperformed PAGOdA, solving some performance issues with the latter tool.

8.2.1.RSAComb

As part of this work, we introduced RSAComb, an improved implementation of the combined approach algorithm for RSA, released as free and open source software [39]. Given that the original reference implementation [22] was not available when we started this work, and some details about the testing process are not provided, we will not try to draw a comparison between the results provided here and those provided in the original paper.

Our implementation is written in Scala and uses RDFox2525 as the underlying Datalog reasoner. At the time of writing, development and testing have been carried out using Scala v2.13.5 and RDFox v5.2. We can easily interface Scala with Java libraries and in particular the OWLAPI [35] for easy ontology manipulation. Thanks to the Java wrapper API provided with RDFox we were able to take advantage of a tight integration with the tool and simplify the following integration into ACQuA.

In the following we provide tests result of our system against LUBM [30] and Reactome2626 using the set of queries originally used in the testing of the combined approach for RSA [22]. All results provided below are averages of at least 3 measurements.

Fig. 8.

Scalability of approximation to RSA and canonical model computation in RSAComb.

Scalability of approximation to RSA and canonical model computation in RSAComb.

In Fig. 8 we show the scalability of our algorithm for the lower bound approximation to RSA and the computation of the canonical model for the approximated ontology. The two steps are query independent and the trend appears to be linear w.r.t. the dataset size, both in LUBM and Reactome; this can be explained by observing that the approximation algorithm involves the materialization of the input dataset against a modified version of the ontology, hence depending on the size of the whole KB;

Fig. 9.

RSAComb answer filtering in Reactome.

RSAComb answer filtering in Reactome.

The filtering process is instead less dependent on the size of the data and more dependent on its composition and distribution. As such, a bigger dataset does not necessarily correspond to a greater amount of filtering, as shown in Fig. 9, where we reported the execution time for query 1 and 2 in Reactome. This figure also shows how the filtering depends on the data distribution; both queries take longer on a 50% sample of the data than on other datasets (even larger ones) due to its specific content. In general, we noticed that the time spent by the system on the filtering step is considerably lower than the time spent on the canonical model computation (as described below, and shown in Fig. 11).

Fig. 10.

Answer filtering in Query 2 in LUBM.

Answer filtering in Query 2 in LUBM.

This unpredictability of the filtering step can “backfire” when a huge amount of filtering is involved. In Fig. 10 we show the filtering time for query 2 in LUBM along with the amount of unfiltered answers that the filtering program needs to process. It is worth noting that less than 1‰ of the unfiltered answers are found to be part of the certain answers. Figure 10 confirms the previous claims that the filtering step grows proportionally to the amount of filtering that is needed for a particular query. Finally, this figure shows how our system is able to handle a gigantic filtering step, processing hundreds of millions of facts in a reasonable amount of time.

Fig. 11.

Percent time distribution of canonical model computation (at the bottom, in blue) and answer filtering (at the top, in yellow) in Reactome.

Percent time distribution of canonical model computation (at the bottom, in blue) and answer filtering (at the top, in yellow) in Reactome.

Finally, Fig. 11 shows how execution time is distributed among the two main tasks of the combined approach. Filtering takes consistently less that 20% of the total execution time, when considering bigger datasets. As mentioned before, we can limit the impact of the canonical model computation by computing it “offline” whenever we find ourselves in a scenario in which we need to perform query answering over a fixed ontology.

8.2.2.ACQuA

We will now provide test results for the PAGOdA batch against ACQuA; this will allow us to draw a direct comparison between the tool and PAGOdA [85], for which the same benchmarks were used during the evaluation phase. During our tests we were able to reproduce the results provided in the original paper except for UOBM, for which PAGOdA does not terminate with a timeout of 10h and outputs no relevant information.

We chose this as a first set of benchmarks because we were able to use the extensive analysis on PAGOdA’s performance to guide our research and easily detect those cases that our system could improve. PAGOdA initially divided its test results into three groups:

  • (G1) queries for which the bounds match;

  • (G2) queries with a non-empty gap, but for which summarization is able to filter out all remaining spurious answers;

  • (G3) queries where HermiT is called on at least one of the test datasets.

When considering RSAComb and PAGOdA, the building blocks of ACQuA, separately, efficiency in the two tools mainly depends on the input ontology and the type of query answered, with PAGOdA showing worse performance when heavily relying on HermiT. On the other hand, when combining the tools into ACQuA, RSAComb is able to further limit the occurrence of these cases, providing better performance overall.

In general ACQuA is able to match PAGOdA’s results in all queries in the (G1-2) groups. This should not come as a surprise, since the already excellent results from PAGOdA weren’t leaving much room for improvement and were showing that more complex CQ answering techniques were not needed for these families of queries. In particular, for query in the G1 group, ACQuA does not perform any additional step other than PAGOdA’s computation of lower and upper bounds (avoiding the use of HermiT altogether).

For this reason we will be focusing on those queries falling in the (G3) group, for which PAGOdA’s performance does not scale well.

Fig. 12.

Scalability of query processing times for LUBM, UOBM and Reactome in ACQuA vs PAGOdA.

Scalability of query processing times for LUBM, UOBM and Reactome in ACQuA vs PAGOdA.

According to [85, Section 10.3.2], and partially confirmed by our tests, PAGOdA falls back to HermiT in the following queries to compute the correct set of answers: queries 32 and 34 in LUBM, query 18 in UOBM (for some data sizes) and query 65 in Reactome. Figure 12 sums up the results for our tests. Pre-processing times for the ontology are not taken into account here since the process is common to both tools and, in general, can be computed offline.

LUBM For both queries we were able to compute matching bounds, skipping the call(s) to HermiT altogether. This resulted in a significant improvement on the query processing time (Fig. 12a). It is interesting to notice that in this case the nature of the data and the queries seem to lead to a linear growth with respect to the size of the data.

Additionally, while testing the tools, we noticed that PAGOdA was having some difficulties returning a sound set of certain answers to queries involving existential knowledge, potentially falling back to ground semantics instead.

Example 8.1.

LUBM TBox contains the following axiom describing the fact that each research assistant works for at least one research group

(31)ResearchAssistantworksFor.ResearchGroup
The following query sw-14-sw233382-g027.jpg

should return all 39 instances of ResearchAssistant contained in LUBM(1), but PAGOdA returns 0 answers (which is only correct under ground semantics).

UOBM We could not perform a direct comparison with UOBM since we were unable to reproduce PAGOdA’s results [85], with the tool timing out on a 10h run with no relevant output, even for the smaller datasets. Regardless, we were able to observe a recognizable pattern in the results for query 18. In Fig. 12b, we report our results against an estimate of PAGOdA’s performance; the estimation was carried out by looking at the graphs of the original paper and considering the closest values that the resolution of images allowed us to read. Even in this case we were able to avoid the use of HermiT, consequently improving the query processing time overall.

Reactome We were able to answer query 65 with matching bounds, avoiding again the use of HermiT. This resulted in an improvement of almost 600 seconds for the full Reactome dataset.

Furthermore, we found that the answers returned by PAGOdA for some of the queries in LUBM were only correct if considering CQ answering under ground semantics. Examples of these are query 15–16 from the PAGOdA benchmarks, for which PAGOdA was able to return only an incomplete set of answers.2727 In these cases ACQuA was able to fix the issue and compute the sound and complete set of answers under certain answer semantics, by avoiding the use of PAGOdA overall.

8.3.OOR batch

For the second batch of benchmarks executed on the Oxford Ontology Repository, we were able to identify a set of queries for which PAGOdA requires the use of HermiT for the full computation of the query answers.

Table 8

PAGOdA and ACQuA statistics on OOR batch (over 126 ontologies and 22462 queries)

Ontologies processedQueries executedNon-empty queries
PAGOdA103182351455
ACQuA126224622256

As shown in Table 8, PAGOdA was able to process 103 out of 126 ontologies considered, executing around 81% of the generated queries; ACQuA, on the other hand, was able to process the entire set of ontologies, answering the full suite of generated queries. Around 10% of the queries have a non-empty answer set, and while ACQuA was able to answer all of them, PAGOdA can reliably answer only 65%.

We identified a set of 18 queries (role atomic queries) over DOLCE [23] for which PAGOdA required the use of HermiT. In these cases, only the lower bound computed by PAGOdA is exact, while ACQuA was able to compute a matching upper bound. This was detected in two different fragments of DOLCE from the repository, corresponding to ontology 14 and 24. Ontology 24 corresponds to the full DOLCE ontology, while ontology 14 is a fragment of 24 partially restricting the ABox. Both ontologies are classified as SHOIN(D).

In Fig. 13 we provide quantitative and performance results for the queries over ontology 24, where we denote the lower bound, upper bound and query processing time for the corresponding tools with L, U and T respectively. We omit ontology 14 since the results are similar to the ones reported for ontology 24.

In the first 16 queries we obtained comparable results (see Fig. 13). This is understandable since DOLCE is a relatively small ontology (with a very small ABox) and this ended up hiding the performance differences that would potentially appear with larger datasets. Moreover, it should be noted that PAGOdA is able to deal with the larger upper bound by performing a limited amount of calls to HermiT (up to 8). The number of calls increases to 19 in the last two queries; these are also the cases in which we can observe a greater gain in performance using ACQuA.

Fig. 13.

Execution time (T) on DOLCE queries in PAGOdA (red) vs ACQuA (orange), alongside quantitative results for the lower (L) and upper bounds (U) computed by the two tools.

Execution time (T) on DOLCE queries in PAGOdA (red) vs ACQuA (orange), alongside quantitative results for the lower (L) and upper bounds (U) computed by the two tools.

Finally, we found a set of 23 queries across multiple ontologies for which PAGOdA returned an unsound set of answers. This is the whole set of queries that PAGOdA was unable to process. This behaviour was not observed in ACQuA, which was able to compute the correct answers to the queries, without calling PAGOdA.

For the rest of the tested queries PAGOdA and ACQuA had comparable performance and were able to compute matching bounds.

To conclude this section, we provide a list of performance results and improvements highlighted by our evaluation:

  • RSAComb shows linear scalability for preprocessing and canonical model computation steps. Moreover, the filtering time is lower on average than the canonical model computation;

  • RSAComb handles huge amounts of data in a reasonable amount of time;

  • the additional logic introduced by ACQuA is able to outperform PAGOdA in a variety of test cases, improving both the lower and upper bounds;

  • ACQuA is able to fix some performance issues present in PAGOdA, by computing matching and hence further limiting the use of HermiT.

9.Discussion and conclusions

In this work, we presented a new hybrid query answering architecture that combines black-box services to provide a CQ answering system for OWL. Our system builds upon scalable CQ answering services for tractable restrictions of OWL, combining them with a CQ answering routine for a more expressive language. The technique is based on the computation of lower and upper bounds to the answers to a CQ and their progressive refinement to compute the full set of certain answers. We proposed two novel algorithms to compute lower and upper bounds to the answers to a query via approximation to RSA and RSA+, respectively, along with reference implementations, used in two new systems:

  • RSAComb, an efficient implementation of the combined approach for RSA [22], introducing a new design and the use of RDFox as the underlying Datalog reasoner. The system improves upon the theoretical contribution of the original work by introducing several heuristics, in order to render the algorithm more efficient in practice. The system accepts any OWL 2 KB and includes a customizable approximation step to languages compatible with the RSA combined approach. To this end, we included a reference implementation of the novel approximation algorithms for the computation of answer bounds mentioned above.

  • ACQuA, a reference implementation of the hybrid architecture combining RSAComb, PAGOdA [85], and HermiT [24] to provide a CQ answering service for OWL. By building ACQuA, we showed how the novel idea of chaining multiple services to refine answer approximations is feasible in practice. Ideally, the services it is built upon can be substituted or paired with more capable ones to improve the system performance–wise.

We provided an extensive evaluation of the systems, first testing scalability and performance of RSAComb as a standalone system and then, comparing ACQuA against PAGOdA. We showed how the additional computational cost introduced by reasoning over a more expressive language like RSA can still provide a significant improvement compared to relying on a fully-fledged reasoner. Additionally, we showed how ACQuA can reliably match PAGOdA’s performance, and further limit performance issues originally present in PAGOdA, especially when the tool has to extensively rely on HermiT.

We intend to further extend this work in a few different directions. The RSAComb-based algorithms for the computation of answer bounds depend on a cycle-detection procedure over a KB dependency graph. We think that altering the traversal of the graph and adopting (query dependent) heuristics in the cycle-detection algorithm could improve the quality of the computed bounds.

Moreover, ACQuA mostly focuses on ontology manipulation for computing bounds and further processing gap answers. While query independent processes can be cached or computed offline, a different, complementary, approach would be to study the problem of computing answer bounds from a query perspective. An example of such a technique for computing bounds to answers to SPARQL queries has been presented by Glimm et al. [27].

To conclude, this work led us to believe that relying on hybrid frameworks and leveraging existing systems for CQ answering is a winning strategy that can render the problem more viable in practice. Thanks to its modularity, this approach can benefit from the broader research in the area of knowledge representation, description logics, and CQ answering.

Notes

1 By “black-box” we mean that the details of the reasoning process are not relevant, and indeed any reasoner providing comparable reasoning services could be used. However, the semantics are completely transparent, and the results could be interpreted/explained using a wide variety of techniques from the extensive literature on this topic (e.g., [3]). Note that this differs from the definition of the term that is often used, e.g., in the context of Machine Learning (ML), where the semantics of the system is opaque to the user.

4 In order to achieve decidability of reasoning, SROIQ ontologies must satisfy certain additional requirements. These, however, do not affect the technical results reported in this paper.

5 Property chain and transitivity axioms are not taken into consideration here to keep definitions compatible with [22].

6 In the following we consider the input KB to be consistent and normalized. This is ensured by PAGOdA’s preprocessing step.

7 OWL 2 RL does not allow axioms (T5) and EL (which contains ELHOr) does not allow axioms (T4) or inverse roles (R1).

9 An oriented forest is a directed acyclic graph whose underlying undirected graph is a forest.

10 It is worth noting that, to the best of our knowledge, at the moment a similar bound for RSA+ is not know.

11 In this case, K falls in one of the profiles for which the lower bound computation in PAGOdA is sound and complete for CQ answering.

12 In this case, RSAComb provides a sound and complete algorithm for CQ answering over K.

13 While axioms of type (T4) do not use disjunction explicitly, their translation into definite rules involve disjunction in the head of the rule.

14 This is equivalent to rewriting the axiom as AR.{c},{c}B.

16 RDFox accepts both OWL 2 axioms encoded as RDF triples and Datalog rules; these are very different entities in the system and the semantics of special concepts/roles (like ⊤ and ≈) is applied to the former.

17 Rule 30 showcases how the SKOLEM predicate can be used in both directions: given a sequence of terms, we can pack them into a single fresh term; given a previously Skolemized term, we can unpack it to retrieve the corresponding sequence of terms.

20 Often expressed in the W3C standard R2RML language [15].

27 This is most likely due to a bug in the PAGOdA codebase.

28 These proofs are the result of several discussions with the authors of [22].

29 Here we are using an alternative, but equivalent, definition of certain answer. Given a query q(x) and a KB K, acert(q,K) iff K,Iq(a) for every model I of K.

Appendices

Appendix

AppendixProofs

This chapter provides proofs for lemmas and theorems used in Sections 45.2828

In the following we will consider either an RSA or an RSA+ KB K=O,A (and explicitly state when some result holds only for one of the two languages) and a CQ q(x)=y.ψ(x,y). For PK,q, EK and π(K),, we will refer to their LHMs as M, Mc (canonical) and Mu (universal), respectively. Note that, by definition of PK,q, it is the case that McM.

We start with the notations concerning terms and atoms. For terms s and t, we write st (s<t) iff s is (strictly) contained in t. The root of a term t is its non-functional part, i.e., root(f1(f2((fn(a)))))=a. We say that a term t has type (A,R,B) if t is either of the form vR,BA,i or of the form fR,BA(·).

The derivation level of a ground atom a=P(t)M[Π] with Π a stratified program, is denoted by level(a,M[Π]) and is a pair of natural numbers (k,l) where k denotes the stratum of P and l is the smallest number such that aTΠkl(U), where U=, if k=1, and U=TΠk1ω(Ui), otherwise. The derivation level of a ground term tterms(M[Π]), where Π is a stratified program, is denoted as level(t,M[Π]) and is a pair of natural numbers (k,l), such that t occurs in an atom aM[Π] s.t. level(a,M[Π])=(k,l) but t does not occur in any atom aM[Π] such that level(a,M[Π])=(k,l) and k<k, or k=k and l<l. When a program Π has only one stratum k, the stratum is dropped from the derivation level of the corresponding atom/term.

Theorem 4.1.

Let K=O,A be the ALCHOIQ restriction of the KB K=O,A. Moreover, let K=shift(O),A. Then cert(q,K)cert(q,K).

Proof.

Let M=M[π(K),]. We recall that, given a predicate P in the signature of K, we denote with P a fresh predicate, introduced by shift(·), intuitively representing the complement of P. In order to prove the theorem, we introduce the following claims:

  • (i) if M, then, K is inconsistent;

  • (ii) if P(c)M, then, KP(c), for any P introduced by shift and K consistent;

  • (iii) if P(c)M, then, KP(c), for some P in the signature of K and K consistent;

We can prove these claims by induction on the derivation level of atoms in M.
  • (i) If A then, K and hence K is inconsistent. Otherwise, there must be some rule r of the form i=1nAi(x) in K such that Ai(c)M, for some constant c and 1in. If rK, then, by definition of shift, rK. Moreover, by IH, we have KAi(c) for 1in, and hence K (i.e., K is inconsistent).

  • (ii) Let P(c)M, with P predicate introduced by shift for some predicate P in the signature of K. Since P(c)A, there must be a rule

    (32)ri=1nAi(x)i=1mBi(x)P(x)
    with Ai(c)M for 1in and Bi(c)M for 1im. By IH, KAi(c) for 1in and KBi(c) for 1im. Moreover, if rK then, there is a rule rK s.t. either

    • (a) r is of the form i=1nAi(x)P(x)i=1mBi(x). Since K is consistent, KP(c).

    • (b) m=0 and r is of the form i=1nAi(x)P(x). Assume KP(c); then, K is inconsistent – contradiction, and hence KP(c)

  • (iii) Let P(c)M, for some P in the signature of K. If P(c)A, then KP(c). Otherwise,

    • (a) there must be some rule ri=1nAi(x)i=1mBi(x)P(x) in K, Ai(c)M for 1in, Bi(c)M for 1im. By IH, KAi(c) for 1in and KBi(c) for 1im. Moreover, if rK, there must be a rule rK of the form

      (33)i=1nAi(x)i=1mBi(x)P(x)
      Since K is consistent, KP(c).

    • (b) For all other possible rules r that can derive P(c), it is the case that rK implies rK and, by IH, we have that KP(c).

If q=, then the theorem follows from claim (i). Otherwise, let q(x)=yφ(x,y) and let σ be a certain answer to q w.r.t. K. Then, by definition, there exists σ such that, for every αφ(x,y)σσ, αM and, by claim (iii), Kα. Finally, we have that Kφ(x,y)σσ, and hence Kyφ(x,y)σ, which, by definition of conjunctive query answer, implies σcert(q,K).  □

We next relate terms in Mc and Mu to terms in MRSA.

Lemma A.1.

Let ηc:terms(Mc)terms(MRSA) be the following function

(34)ηc(t)=tif tNIuR,BAif t is of type (A,R,B)
Then, for every t1,t2terms(Mc) it holds that
  • A(t1)Mc implies A(ηc(t1))MRSA;

  • R(t1,t2)Mc implies R(ηc(t1),ηc(t2))MRSA;

  • t1t2Mc implies ηc(t1)ηc(t2)MRSA.

Proof.

Trivial, by definition of MRSA and induction on the derivation level of atoms in Mc.  □

Lemma A.2.

Let ηu:terms(Mu)terms(MRSA) be the following function

(35)ηu(t)=tif tNIuR,BAif t is of type (A,R,B)
Then, for every t1,t2terms(Mu) it holds that
  • A(t1)Mu implies A(ηu(t1))MRSA

  • R(t1,t2)Mu implies R(ηu(t1),ηu(t2))MRSA

  • t1t2Mu implies ηu(t1)ηu(t2)MRSA

Proof.

Trivial, by definition of MRSA and induction on the derivation level of atoms in Mu.  □

Lemma A.3.

Let t1,t2terms(Mc). Then, βt1t2Mc implies at least one of the following holds:

  • 1. t1aMc, for some aNI,

  • 2. t1 is of the form f(u) and t2 is of the form g(v) with uvMc.

  • 3. t1 is of the form vR,BA,i and t1 and t2 are identical (i.e., the same term),

Proof.

We prove the lemma, together with the following additional claims, by induction on the derivation level of atoms in Mc:

  • (i) Let R(t1,t2)Mc with t2 of some type τ, RRS for some S occurring in an axiom (T4). Moreover, let t3terms(Mc) s.t. t2t3Mc with ηc(t2)ηc(t3). Then, t2 is of the form f(u) with ut1Mc.

  • (ii) Let R(t1,t2)Mc with t1 of some type τ, RRInv(S) for some S occurring in an axiom (T4) Moreover, let t3terms(Mc) s.t. t1t3Mc with ηc(t1)ηc(t3). Then, t1 is of the form f(u) with ut2Mc.

In the following, let R(t1,t2) be an atom in Mc.

  • (i) Moreover, let t2 be of some type τ, RRS for some S occurring in an axiom (T4) and t2t3Mc with t3terms(Mc) and ηc(t2)ηc(t3).

    Then, there must be at least one rule in EK of the form:

    • (a) C(x)R(x,fR,DC(x))D(fR,DC(x)) with C(t1)Mc and t2=fR,DC(t1).t1t1Mc and hence the claim holds.

    • (b) C(x)R(x,vR,DC)D(vR,DC) with C(t1)Mc. This is in contradiction with the fact that R is unsafe, i.e., R occurs in a (T5) axiom and RRS with S occurring in a (T4) axiom.

    • (c) T(x,y)R(x,y) with T(t1,t2)Mc and level(T(t1,t2),Mc)<level(R(t1,t2),Mc). Since TRRS, with S occurring in an axiom (T4), by IH, the claim holds for T(t1,t2). Then it trivially holds for R(t1,t2) as well.

    • (d) Inv(R)(y,x)R(x,y) with Inv(R)(t2,t1)Mc and level(Inv(R)(t2,t1),Mc)<level(R(t1,t2),Mc). It can be easily shown that Inv(R) fulfils all hypothesis of claim (ii), and, by IH, it follows that t2 is of the form f(u) with ut1Mc.

    • (e) R(x,y)yzR(x,z) and t term s.t. R(t1,t),tt2Mc. By IH, t is of the form f(u) with ut1Mc. Moreover, since t is of the form f(u), by the main claim of Lemma A.3, t2 must be of the form g(v) with uvMc. Then, the claim holds, since t2 is of the form g(v) with vt1, for transitivity of ≈.

    • (f) R(x,y)xzR(z,y) and t term s.t. R(t,t2),tt1Mc. Similar to case (i)e, using claim (ii).

  • (ii) Similarly, let t1 be of some type τ, RRInv(S) for some S occurring in an axiom (T4) and t1t3Mc with t3terms(Mc) and ηc(t1)ηc(t3).

    Then, there must be at least one rule in EK of the form:

    • (a) C(x)R(x,fR,DC(x))D(fR,DC(x)) with C(t1)Mc and t2=fR,DC(t1). Then, from Lemma A.1, it follows that R(ηc(t1),uD,RC)MRSA. But then, K is not equality-safe, since:

      • t1t3Mc with ηc(t1)ηc(t3). Then by definition of ηc in Lemma A.1, t1,t2 must be distinct.

      • R(ηc(t1),uD,RC)MRSA.

      • S s.t. RRInv(S) and S occurs in an axiom (T4).

      This contradicts our hypothesis that K is RSA.

    • (b) C(x)R(x,vR,DC)D(vR,DC) – in contradiction with the fact that R is unsafe.

    • (c) T(x,y)R(x,y) such that T(t1,t2)Mc, similar to case (i)c.

    • (d) Inv(R)(y,x)R(x,y) such that Inv(R)(t2,t1)Mc, similar to case (i)d and using claim (i).

    • (e) R(x,y)yzR(x,z) and a term t3 such that R(t1,t3),t3t2Mc, similar to case (i)e.

    • (f) R(x,y)xzR(z,y) and a term t3 such that R(t3,t2),t3t1Mc, similar to case (i)f.

Now, let βt1t2Mc; then, there must be some rule in EK of the form:

  • (a) (x)xx such that t1=t2=x. We can distinguish three different cases:

    • x=a, for some aNI. Then, t1a and condition 1 is satisfied.

    • x is of the form fR,BA(u) for some type (A,R,B). Then, t1=t2=fR,BA(u) with uu because of the semantics of ≈; condition 2 is satisfied.

    • x is of the form vR,BA,i for some type (A,R,B) and i{0,1,2}. Then, t1=t2=vR,BA,i and condition 3 is satisfied.

  • (b) A(x)xa, with A(t1)Mc Then, t2=a and t1aMc; condition 1 is fulfilled.

  • (c) A(x)S(x,y)B(y)S(x,z)B(z)yz and t3 term, s.t. A(t3),S(t3,t2),B(t2),S(t3,t1),B(t1)Mc. We distinguish between the following cases:

    • ηc(t1)=ηc(t2). If t1=t2 the claims of the lemma trivially hold. If t1t2, then t1 and t2 must have the same type (C,R,D). Then t1=fR,BA(u) and t2=fR,BA(v) for some type (A,R,B) and with uv. It can be shown that atoms S(t3,fR,BA(u)),S(t3,fR,BA(v)) cannot be introduced in an RSA ontology.

    • ηc(t1)ηc(t2). If either t1=a or t2=b, with a,bN1, then, condition 1 trivially holds for β. Otherwise, from claim (i) it follows that:

      • t1=f(u), with ut3Mc.

      • t2=g(v), with vt3Mc.

      Then, for transitivity of ≈, uvMc and condition 2 holds for β.

  • (d) xyyx with t2t1Mc. By IH, the lemma holds for t2t1 and, since all conditions are symmetric, it holds for β as well.

  • (e) xyyzxz and t3 term s.t. t1t3,t3t2Mc. By IH, the lemma holds for t1t3 and t3t2:

    • If condition 1 holds for t1t3, s.t. t1a for some aNI, then it holds for t3t2 (since t3t1a) and for β.

    • If condition 2 holds for t1t3, then t1 is of the form f(u) and t3 is of the form g(v), with uvMc. Since t3 is of the form g(v), condition 2 must hold for t3t2 as well, and hence t2 is of the form h(w), with vwMc. Then, for transitivity of ≈, uw and condition 2 holds for β.

    • If condition 3 holds for t1t3, then t1 and t3 are identical and of the form vR,BA,i for some type (A,R,B) and i{0,1,2}. Since t3 is of the form vR,BA,i, condition 3 must hold for t3t2 as well, and hence t2=vR,BA,i. Then, t1=t2=vR,BA,i and condition 3 holds for β.

 □

Lemma A.4.

Let t1,t2terms(Mu). Then t1t2Mu implies that either:

  • 1. t1aMu, for some aNI

  • 2. t1 is of the form f(u) and t2 is of the form g(v) with f,g function symbols in Mu and uvMu

Proof.

Similar to the proof for Lemma A.3, by induction on the derivation level of atoms in Mu.  □

Definition A.1.

Let Ψ be a conjunction of atoms of the form

(36)i=1mAi(ti)j=1nRj(u1j,u2j)
An adornment for Ψ is a vector a such that |a|=n and aj{f,b,} for every 1jn (where ␣ denotes the empty adorning, i.e., R is syntactically equivalent to R). We denote with Ψa the adorned formula:
(37)i=1mAi(ti)j=1nRjaj(u1j,u2j)
where Rjaj is a syntactic renaming of Rj for every 1jn.

Definition A.2.

Let Ψa be the adorned formula of the form

(38)i=1mAi(ti)j=1nRjaj(u1j,u2j)
Then, the normal form of Ψa, denoted with Ψna, is the formula
(39)i=1mAi(ti)j=1nLj
where
(40)Lj=R(u1j,u2j)if aj=Rf(u1j,u2j)if aj=fInv(R)f(u2j,u1j)if aj=b

Definition A.3.

Let q(x)=yψ(x,y) be a CQ, λ:terms(q)terms(M) be a homomorphism and a be an adornment for q=ψ(x,y). Then (λ,a) is said to be an adorned match for q over M iff the following conditions both holds:

  • 1. M(ψ(λ(x),λ(y)))a;

  • 2. R(t1,t2)(ψ(λ(x),λ(y)))a, we have Rf(t1,t2)M and Rb(t1,t2)M.

Definition A.4.

Let (λ,a) be an adorned match for q(x) over M. We say that (λ,a) is non-anonymous if named(λ(x))M for all xx.

Definition A.5.

Let (λ,a) be an adorned match for q(x) over M. We say that (λ,a) is fork-free iff for every two atoms of the form Rf(u,yi),Sf(v,yj)(ψ(x,y))na, such that yi,yjy and id(λ(x),λ(y),i,j)M, it is the case that λ(u)λ(v).

Definition A.6.

Let (λ,a) be an adorned match for q(x) over M. We say that (λ,a) is acyclic iff there is no sequence of atoms

(41)Ro1f(yl1,yl2),Ro2f(yl3,yl4),,Ropf(yl2p1,yl2p)(ψ(x,y))na
such that
  • id(λ(x),λ(y),l2i,l2i+1)M for every 1ip where l2p+1=l1;

  • NI(λ(ylj))M for every 1j2p.

Lemma A.5.

For a given substitution λ:xterms(M), it is the case that MAns(λ(x)) iff there exists an adorned match (λ,a) for q over M which is non-anonymous, fork-free and acyclic, where λ is a homomorphism that extends λ to terms(q).

Proof.

Trivial, from the definitions of π(K),, M (and in particular the filtering program in Table 5), and Definition A.3.  □

Lemma A.6.

For a given substitution λ:xterms(M), if λ(x)cert(q,K) then there exists a match λ for q(x) over Mu where λ is a homomorphism that extends λ to