On the relation between keys and link keys for data interlinking

Atencia, Manuel; David, Jérôme; Euzenat, Jérôme

doi:10.3233/SW-200414

On the relation between keys and link keys for data interlinking

Issue title: Advancing Agriculture through Semantic Data Management

Article type: Research Article

Authors: Atencia, Manuel^{; *} | David, Jérôme | Euzenat, Jérôme

Affiliations: Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, F-38000 Grenoble, France. E-mails: [email protected], [email protected], [email protected]

Correspondence: [*] Corresponding author. E-mail: [email protected].

Keywords: Data interlinking, keys, ontology alignments, link keys

DOI: 10.3233/SW-200414

Journal: Semantic Web, vol. 12, no. 4, pp. 547-567, 2021

Published: 17 June 2021

Get PDF

Abstract

Both keys and their generalisation, link keys, may be used to perform data interlinking, i.e. finding identical resources in different RDF datasets. However, the precise relationship between keys and link keys has not been fully determined yet. A common formal framework encompassing both keys and link keys is necessary to ensure the correctness of data interlinking tools based on them, and to determine their scope and possible overlapping. In this paper, we provide a semantics for keys and link keys within description logics. We determine under which conditions they are legitimate to generate links. We provide conditions under which link keys are logically equivalent to keys. In particular, we show that data interlinking with keys and ontology alignments can be reduced to data interlinking with link keys, but not the other way around.

1.Introduction

There are large amounts of RDF data available on the Web, in the form of knowledge graphs or as part of linked open data. Interoperability between RDF datasets largely relies on links between resources from different RDF datasets and especially links asserting the identity of resources bearing different IRIs, specified using the owl:sameAs property [20]. Since RDF datasets tend to be large, the automatic discovery of owl:sameAs links between RDF datasets is an important and challenging task. This task is usually referred to as data interlinking and different algorithms and tools for data interlinking have been proposed [18,25].

Among the state-of-the-art approaches to data interlinking, some are based on finding keys [2,9,17,31] or link keys [7,8] across RDF datasets. Both keys and link keys are devices characterising what makes two resources to be identical. Hence, it is natural to exploit them for discovering links across datasets. Even though both techniques have been proven to be effective in data interlinking scenarios, their relationship has not been formally established yet.

The objective of this paper is to clarify the relationship between keys and link keys. For this, we first provide the semantics of (RDF) keys and link keys. More specifically, we formalise how a key, in its different versions, can be combined with an alignment between ontologies for data interlinking. Then, we define the semantics of six kinds of link keys – weak, plain and strong link keys, and their in- and eq-variants – and we logically ground the usage of link keys for data interlinking. Finally, we establish the conditions under which link keys are equivalent to keys and show that data interlinking with keys and ontology alignments can be reduced to data interlinking with link keys, but not the other way around.

The contribution of this paper focuses on the specific features of keys and link keys. It does it, to the extent possible, independently from the underlying ontological schema and the logical constructors used for describing or constraining class and property expressions. This is the reason why computational issues are left for further interesting research.

In the remainder, Section 2 presents the context and related work of the paper. Section 3 introduces the notations used throughout the paper. Section 4 recalls two different semantics of keys and Section 5 logically justifies their use for data interlinking. Section 6 defines link keys. Section 7 logically grounds the use of link keys for data interlinking. The relations between keys and link keys are established in Section 8, both with respect to their logical entailment and the links they produce. Section 9 concludes the paper and discusses future work.

All definitions are illustrated with concrete examples taken from real-world datasets.

2.Context and related work

Data interlinking refers to the process of finding pairs of IRIs of different RDF datasets representing the same entity [18,25]. The result of this process is a set of same-as links to be specified by the owl:sameAs property. To decide whether two IRIs represent the same entity or not is mainly based on comparing their values for selected properties. Data interlinking is reminiscent of the task of record linkage in databases [14], but it is applied to RDF data described with RDFS/OWL ontologies.

Link discovery platforms such as SILK [22,33] and LIMES [26] enable users to process link specifications to generate links. Link specifications express the properties to be used for generating owl:sameAs links between two RDF datasets. They also specify the similarity measures to be used for comparing datatype property values, the aggregation functions for combining similarity values, and the similarity thresholds beyond which two values are considered equal. Link specifications may be directly set by users or they may be built (semi-)automatically, for example, using machine learning techniques [27,29].

A key is a set of (datatype or object) properties that uniquely identify the instances of a class within a dataset. For example,

{creator,title}keyBook

states that, if two instances of the class Book match on values for the properties creator and title then the two instances are the same.

Key-based approaches to data interlinking first extract key candidates from RDF datasets and then select the most accurate candidates according to different quality measures [2,9,17,31]. When the data of two RDF datasets are described using the same ontology, then keys, if available, can be directly used for interlinking the datasets, but if the data are described using different ontologies, then they need to be combined with ontology alignments [16] relating the properties and classes of the data. For example, the previous key could be combined with the alignment correspondences creator≡auteur, title≡titre and Book≡Livre to interlink the books of English and French libraries.

Keys can be used to build link specifications or can be translated into logical rules to perform data interlinking. The latter allows to take advantage of logical reasoning [3,4,21]. Key extraction algorithms discover either S-keys [2,17,31] or F-keys [9,30]. There are two kinds of keys since RDF properties are multivalued, contrary to relational attributes, which are monovalued. If a set of properties form an S-key for a class, it is enough that two instances of the class share one value for each of the properties of the key to infer that they are the same (e.g. email property for the AssistantProfessor class). But if the properties form an F-key then the instances must share all values (e.g. hasPoem property for the PoemAnthology class because two different poem anthologies may have a poem in common but will unlikely contain exactly the same poems).

When datasets are described with different ontologies, alignments must be used, either during the key extraction process or later when performing data interlinking. For example, the approach proposed in [31] searches in a source dataset for S-keys over classes which are equivalent to classes in a target dataset and then selects among the discovered S-keys those composed of properties which are equivalent to properties of the target dataset.

Link keys generalise the combination of keys and ontology alignments for data interlinking [7,16]. A link key is a set of pairs of properties that uniquely identify the instances of two classes of two RDF datasets. For example,

{⟨creator,auteur⟩,⟨title,titre⟩}linkkey⟨Book,Livre⟩

states that, if two instances of the classes Book and Livre, respectively, match on values for the properties creator and auteur, and for the properties title and titre, then they are the same instance. Unlike the previous key, this link key could be directly used to interlink the books of English and French libraries without the need of any ontology alignment.

Unlike [31], the key-based approaches to data interlinking proposed in [2,17] aim to discover S-keys that hold not only in the source dataset, but in both source and target datasets. It is assumed that the datasets are described using the same vocabulary, possibly resulting from merging different ontologies with an alignment, again composed of equivalence correspondences only. Link keys do not require the properties that compose them to be equal or semantically aligned. In addition, as we will show in this paper, the kind of keys discovered in [2,17] correspond to strong link keys, although data interlinking may be possible with weak link keys (the kind of link keys considered in [7,16]) when strong link keys do not exist.

The formal semantics of S-keys and F-keys have been given in [6] using rules, but the combination of S-keys and F-keys with ontology alignments for data interlinking is not formally addressed. In this paper, we address it using description logics.

Different approaches to incorporate keys and functional dependencies to description logics have been proposed. Keys may be treated as a new concept constructor [11,32] or as global constraints in a separate key box (KBox) [12,13,23,24], which is the option that we follow in this paper. The goal of these approaches is to study the decidability of reasoning with keys or functional dependencies in specific description logics. Here, we do not address automated reasoning with link keys. Instead, we use the formalism of description logics to provide the semantics of keys and link keys. This allows us to ground their legitimacy in generating links across RDF datasets. In addition, it gives us the means to compare keys and link keys on the basis of their entailments and the links they generate.

3.Preliminaries

This section introduces minimal notions and notations used throughout the entire paper. We assume that the reader is familiar with the basics of description logics (DLs) [28].

In this paper, ontologies will be the combination of a schema and a dataset, and they will be modelled as DL knowledge bases.

Definition 1

Definition 1(Ontology).

An ontology is a knowledge base O=⟨S,D⟩ made up of a terminological box (TBox) S and an assertional box (ABox) D. S and D will be referred to as the schema and dataset of O.

Thus, a schema is modelled as a set of terminological axioms, i.e. a set of subsumption, equivalence and disjointness axioms between classes and properties: C1RC2 and p1Rp2 with R∈{⊑,⊒,≡,⊥}. A dataset is a set of assertions about individuals, i.e. a set of class and property assertions and equality statements: C(a), p(a1,a2) and a1≈a2.1 1 Classes, properties and individuals (C1,p1,a1,…) define the vocabulary of an ontology. Notice that we make no restriction on the language, i.e. the classes and properties of ontologies may be built with any DL constructor. The semantics of ontologies is inherited from the model-theoretic semantics of knowledge bases using DL interpretations I=(ΔI,·I).

Alignments relate entities – classes, properties, individuals – that belong to different ontologies [16]. Alignment relations between classes and properties are subsumption, equivalence and disjointness. In the case of individuals, they are related by equality. Alignment statements between classes and properties are referred to as correspondences, whereas equality statements in alignments will be called links.

We will also model alignments as knowledge bases. The difference with ontologies is that, in the case of an alignment, the TBox and ABox use two ontologies’ vocabularies. In addition, the ABox contains equality statements (links) only.

Definition 2

Definition 2(Alignment).

Let O=⟨S,D⟩ and O′=⟨S′,D′⟩ be two ontologies. An alignment between O and O′ is a knowledge base AO,O′=⟨CO,O′,LO,O′⟩ where CO,O′ is composed of class and property axioms CRD and pRq with R∈{⊑,⊒,≡,⊥}, C and p are class and property expressions in O’s vocabulary and D and q are class and property expressions in O′’s vocabulary, and LO,O′ is composed of equality statements a≈b where a is an individual name in O’s vocabulary and b an individual name in O′’s vocabulary. The axioms in CO,O′ will be referred to as correspondences and the axioms in LO,O′ as links. If no confusion arises, AO,O′, CO,O′ and LO,O′ will be replaced by A, C and L.

Different semantics for alignments may be found in the literature [10,34]. Here, though, we will consider the axioms of two ontologies and the correspondences and links of an alignment between them to be part of one single global ontology. Without loss of generality, we can assume that the vocabularies of O and O′ are disjoint.

In what follows, given an ontology O, we will use the letters C and p (possibly with subscripts or superscripts) to denote class and property expressions of O, respectively, and, in case another ontology O′ is considered, we will use D and q for O′. In this way, C1RC2 and p1Rp2 will be used as general axioms in O, while CRD and pRq as general correspondences in an alignment A between O and O′ (R∈{⊑,⊒,≡,⊥}).

4.Two kinds of keys in description logics

In order to compare keys and link keys, we start by reformulating the semantics of keys [6] as description logic axioms. We distinguish between several types of keys which apply in this context. Instead of S-keys and F-keys, we will speak of in-keys and eq-keys, respectively. The prefixes in- and eq- are shortened forms of intersection and equality. These notations are related to the conditions (1) and (2) in Definitions 3 and 4.

4.1.Semantics of keys

In what follows, given a DL interpretation I=(ΔI,·I), a property p, and a domain individual δ∈ΔI, pI(δ) will denote the set of individuals related to δ through p, i.e. pI(δ)={η∈ΔI:(δ,η)∈pI}.

Definition 3

Definition 3(in-key).

An in-key assertion, or simply an in-key, has the form

({p1,…,pk}keyinC)

where p1,…,pk are properties and C is a class.

An interpretation I satisfies ({p1,…,pk}keyinC) if, for any δ,δ′∈CI,

p1I(δ)∩p1I(δ′)≠∅,…,pkI(δ)∩pkI(δ′)≠∅(1)impliesδ=δ′.

Definition 4

Definition 4(eq-key).

An eq-key assertion, or simply an eq-key, has the form

({p1,…,pk}keyeqC)

where p1,…,pk are properties and C is a class.

An interpretation I satisfies ({p1,…,pk}keyeqC) if, for any δ,δ′∈CI,2 2

p1I(δ)=p1I(δ′)≠∅,…,pkI(δ)=pkI(δ′)≠∅(2)impliesδ=δ′.

According to Definition 3, if two instances of a class share at least one value for each of the properties of an in-key for the class, then we can infer that they are the same instance. This is formalised in Proposition 1.

Proposition 1.

The following holds:

(3)C(a),{pi(a,ci)}i=1k,C(b),{pi(b,di)}i=1k,({p1,…,pk}keyinC),{ci≈di}i=1k⊧a≈b

Proof.

This is a direct consequence of Definition 3: for any interpretation I satisfying all the antecedent axioms of the entailment, aI and bI will belong to CI and will share one value for each of the properties of the in-key, hence aI will be equal to bI. □

Similarly, according to Definition 4, given an eq-key for a class and two instances of the class, we can infer that they are the same instance if they share all values (and at least one) for each of the properties of the key. However, we need to be sure that all known values indeed are all values that the instances may have. This is stated in Proposition 2.

Proposition 2.

The following holds:

(4)C(a),{pi(a,ci1),…,pi(a,ciri)}i=1k,{{a}⊑∀pi.{ci1,…,ciri}}i=1k,C(b),{pi(b,di1),…,pi(b,diri)}i=1k,{{b}⊑∀pi.{di1,…,diri}}i=1k,({p1,…,pk}keyeqC),{ci1≈di1,…,ciri≈diri}i=1k⊧a≈b

Proof.

Let I be an interpretation that satisfies all the antecedent axioms of the above entailment. Let us prove that I satisfies a≈b too. Since I⊧pi(a,cil) then (cil)I∈piI(aI) for i=1,…,k and l=1,…,ri. Also, since I⊧{a}⊑∀pi.{ci1,…,ciri} then piI(aI)⊆{(cil)I}l=1ri. Therefore, piI(aI)={(cil)I}l=1ri. Similarly, qiI(bI)={(dil)I}l=1ri. Now, since I⊧cil≈dil then (cil)I=(dil)I, which implies that piI(aI)=qiI(bI). Furthermore, piI(aI)=qiI(bI)≠∅ since ri⩾1. Additionally, since I⊧C(a) and I⊧C(b) then aI,bI∈CI. Finally, since I⊧({p1,…,pk}keyeqC), and we have aI,bI∈CI and piI(aI)=qiI(bI)≠∅ for i=1…,k, then we can infer that aI=bI, i.e. I⊧a≈b. □

Thus, unlike in-keys, eq-keys require a local closed world assumption – represented in Proposition 2 by the axioms {a}⊑∀pi.{ci1,…,ciri} and {b}⊑∀qi.{di1,…,diri} (i=1,…,k) – which, even though it is generally advised to avoid in the context of the semantic web and linked open data, it is also expected to be made in certain controlled scenarios.

Notice that Proposition 2 requires the logic to be able to express nominals and value restrictions. All our results are agnostic of the used logical language but those referring to the use of eq-keys and eq-link keys for data interlinking.

The semantics of owl:hasKey in OWL2 corresponds to the semantics of in-keys but restricted to being applied to named instances only (thus excluding blank nodes).

Although in-keys and eq-keys have been introduced separately, it is also possible to consider a hybrid notion of key.

Definition 5

Definition 5(Hybrid key).

A key assertion, or simply a key, has the form

({p1,…,pk}{q1,…,ql}keyC)

where p1,…,pk and q1,…,ql are properties and C is a class.

An interpretation I satisfies the key ({p1,…,pk}{q1,…,ql}keyC) if, for any δ,δ′∈CI,

p1I(δ)∩p1I(δ′)≠∅,…,pkI(δ)∩pkI(δ′)≠∅,q1I(δ)=q1I(δ′)≠∅,…,qlI(δ)=qlI(δ′)≠∅impliesδ=δ′.

From here on, an ontology O will be a triple O=⟨S,D,K⟩ which, besides the schema S (TBox) and dataset D (ABox), has as a third component a set of keys K (KBox).

Example 1 below provides examples of in-keys and eq-keys in real RDF datasets.

Example 1

Example 1(Insee).

Insee is the French institution in charge of collecting and publishing information about French economy and society. Part of the Insee data is available in the form of RDF triples and can be downloaded as an RDF dump or queried through a SPARQL endpoint.3 3 Insee ontologies are available too. We only consider the Insee data related to administrative districts (COG dataset).

The Insee vocabulary comprises four class names for describing the main administrative divisions in France: Commune, Arrondissement, Département and Région. Among the properties of these classes, we find the datatype property nom (used to specify the name of an administrative division), the object property subdivisionDe (to specify that an administrative division is subdivision of another one, e.g. that the commune of Grenoble is a subdivision of the Isère department) and the datatype property codeINSEE (which is an identifier for territories, including administrative divisions, and can be thought of as the key in the Insee database). The property subdivisionDe is declared to be transitive in the Insee ontology. This fragment of the Insee ontology is depicted in Fig. 2.

No owl:hasKey axiom is declared in the Insee ontology. Nevertheless, we have checked the in-key and eq-key conditions for the properties and classes mentioned before. We have done so in the RDF graph of Insee extended with the transitivity of subdivisionDe. This generalises to the fully inferred graph as no other axiom of the Insee ontology may have an impact on the satisfiability of the examined key axioms.

As expected, the codeINSEE property is an in-key for Commune, Arrondissement, Département and Région. Formally:

IInsee∗⊧({codeINSEE}keyinCommune)IInsee∗⊧({codeINSEE}keyinArrondissement)IInsee∗⊧({codeINSEE}keyinDépartement)IInsee∗⊧({codeINSEE}keyinRégion)

where IInsee∗ is the natural DL interpretation of the inferred Insee graph.4 4

Concerning the property nom, it turns out to be an in-key for Département and Région, but neither for Commune nor Arrondissement. Indeed, there exist different communes (and arrondissements) sharing the same name. For instance, Bully may refer to three different communes: Bully in the department of Loire, Bully in Rhône and Bully in Seine-Maritime. However, there is no pair of communes of the same department sharing the same name. In fact, nom and subdivisionDe, when put together, form a key for the class Commune. The property subdivisionDe, though, must be treated in the sense of eq-keys. This is because, since subdivisionDe is a transitive property, all French communes share (at least) a value for subdivisionDe, namely, the Insee entity representing the country France. The same holds for the class Arrondissement. Formally (note that we use hybrid keys):

IInsee∗⊧({nom}keyinDépartement)IInsee∗⊧({nom}keyinRégion)IInsee∗⊧({nom}{subdivisionDe}keyArrondissement)IInsee∗⊧({nom}{subdivisionDe}keyCommune)

From here on, we will use the shortcuts Reg, Dep, Arr and Com for the corresponding Insee classes.

4.2.Relations between the different types of keys

Compared to the semantics of S-keys and F-keys defined in [6], the semantics of in-keys corresponds directly to the semantics of S-keys. This is not the case for eq-keys and F-keys. Every eq-key is an F-key but not the other way around. The equivalence would hold if condition (2) in Definition 4 were replaced by

p1I(δ)=p1I(δ′),…,pkI(δ)=pkI(δ′)impliesδ=δ′.

The prerequisite that the sets of property values must be non-empty enables to consider in-keys as a subset of eq-keys (which does not hold between S-keys and F-keys). This result is stated in Proposition 3.

Proposition 3.

The following holds:

({p1,…,pk}keyinC)⊧({p1,…,pk}keyeqC)

Proof.

Let I be an interpretation such that I⊧({p1,…,pk}keyinC). We have to prove that I⊧({p1,…,pk}keyeqC). Let δ,δ′∈CI such that piI(δ)=piI(δ′)≠∅ (i=1,…,k). We have to prove that δ=δ′. Since piI(δ) and piI(δ′) are equal and non-empty, then piI(δ)∩piI(δ′)≠∅ (i=1,…,k). So we have δ,δ′∈CI and piI(δ)∩piI(δ′)≠∅ (i=1,…,k). Since I⊧({p1,…,pk}keyinC), then δ=δ′. □

The converse of Proposition 3 is not true, i.e. there are eq-keys that are not in-keys. Indeed, consider the interpretation defined by (aI,cI),(aI,dI),(bI,cI)∈pI and aI≠bI, cI≠dI. Then I⊧(pkeyeq⊤) whereas I⊭(pkeyin⊤). The converse is true if the key is made up of functional properties, as stated in Proposition 4. Notice that it is possible to define a functional property as a property p such that for any interpretation I=(ΔI,·I) and for any δ∈ΔI then |pI(δ)|⩽1. Indeed, p is functional if and only for any interpretation I=(ΔI,·I) and for any δ∈ΔI, there exists one or no element related to δ via pI, which is equivalent to say that |pI(δ)|⩽1.

Proposition 4.

If p1,…,pk are functional, then

({p1,…,pk}keyeqC)⊧({p1,…,pk}keyinC)

Proof.

Let I be an interpretation such that I⊧({p1,…,pk}keyeqC). Let δ,δ′∈CI such that piI(δ)∩piI(δ′)≠∅ (i=1,…,k). Since pi is functional, then |piI(δ)|⩽1 and |piI(δ′)|⩽1, but, given that their intersection is not empty, then |piI(δ)|=1 and |piI(δ′)|=1. Thus, they are equal and not empty, i.e. piI(δ)=piI(δ′)≠∅ (i=1,…,k). Since I⊧({p1,…,pk}keyeqC) then we can infer that δ=δ′. This proves that I⊧({p1,…,pk}keyinC). □

Proposition 5 shows basic properties of in-keys and eq-keys that will be later used in the proofs of other theorems. In certain occasions, we will write ({pi}i=1kkeyxC) instead of ({p1,…,pk}keyxC) (x∈{in,eq}) to shorten too long expressions. Property (5) may be thought of as a version of Armstrong’s reflexivity axiom for functional dependencies in relational databases [5]. It is not surprising that it corresponds to one of these axioms as it deals with sets of properties. This is not the case of the other properties, as they deal with constructors not found in the relational model. Properties (6), (7) and (8) specify how keys behave with subsumption, intersection and union of classes, respectively. Properties (9) and (10) specify how keys behave with subsumption and equivalence of properties. Interestingly, (9) does not hold for eq-keys.

Proposition 5.

The following holds:

(5)({pi}i=1kkeyxC)⊧({pi}i=1k+1keyxC)(6)({pi}i=1kkeyxC),C⊒D⊧({pi}i=1kkeyxD)(7)({pi}i=1kkeyxC)⊧({pi}i=1kkeyxC⊓D)(8)({pi}i=1kkeyxC⊔D)⊧({pi}i=1kkeyxC)(9)({pi}i=1kkeyinC),{pi⊒qi}i=1k⊧({qi}i=1kkeyinC)(10)({pi}i=1kkeyxC),{pi≡qi}i=1k⊧({qi}i=1kkeyxC)

with x∈{in,eq}.

Proof.

Properties (5) and (6) follow directly from Definitions 3 and 4, and Properties (7) and (8) are direct consequences of property (6).

Let us prove (9). Let I be an arbitrary DL interpretation such that I⊧({p1,…,pk}keyinC) and I⊧pi⊒qi (i=1,…,k). We have to prove that I⊧({q1,…,qk}keyinC). Let δ,δ′∈CI such that qiI(δ)∩qiI(δ′)≠∅ (i=1,…,k). Since I⊧pi⊒qi then qiI(δ)⊆piI(δ) and qiI(δ′)⊆piI(δ′), and, since qiI(δ)∩qiI(δ′)≠∅, then piI(δ)∩piI(δ′)≠∅ (i=1,…,k). This together with I⊧({p1,…,pk}keyinC) implies δ=δ′. Therefore, I⊧({q1,…,qk}keyinC).

Property (10) can be proven analogously. □

In the following section, we establish when it is legitimate to combine in-keys and eq-keys with alignments for data interlinking.

5.Data interlinking with keys and alignments

So far, we have considered keys independently from their use for data interlinking. Keys are able to identify duplicate resources within the same dataset and links between resources from different datasets described using the same ontologies. But as soon as the datasets do not share the same schema, keys alone are not enough for performing data interlinking, and alignments are required.

In this section, we uncover the implicit role of alignments in the process of data interlinking with keys. We show that data interlinking can be expressed as a direct logical consequence of the semantics of keys and alignments. We also highlight the need for completion when interlinking data with eq-keys.

Data interlinking can be formulated as an inference problem: for two given ontologies O=⟨S,D,K⟩ and O′=⟨S′,D′,K′⟩ equipped with keys and an alignment A=⟨C,L⟩ between O and O′, the problem is to check, for any pair of individual names a and b of O and O′, respectively, if the following entailment holds:

O,O′,A⊧a≈b

This formulation also includes the particular case when O and O′ share the same schema, as in this case the set of correspondences is empty, i.e. A=⟨∅,L⟩.

In the following, we give conditions on the schemas S and S′, the datasets D and D′, the set of class and property correspondences C, and the set of (known) links L, that, in the presence of a key κ∈K, are sufficient for inferring a (new) link a≈b. These conditions change depending on whether κ is an in-key or an eq-key, as specified in Theorem 1 and Theorem 2 below. These two theorems provide the logical grounds of data interlinking with keys and alignments.

Theorem 1.

Let O=⟨S,D,K⟩ and O′=⟨S′,D′,K′⟩ be two ontologies and A=⟨C,L⟩ and alignment between O and O′ such that

– ({p1,…,pk}keyinC)∈K, and
– {C⊒D}∪{pi⊒qi}i=1k⊆C.

Then, for any pair of individual names a and b of O and O′, respectively, if

– {C(a)}∪{pi(a,ci)}i=1k⊆D,
– {D(b)}∪{qi(b,di)}i=1k⊆D′, and
– {ci≈di}i=1k⊆L,

then O, O′, A⊧a≈b.

Proof.

Notice that C⊒D and D(b) entail C(b), and that pi⊒qi and qi(b,di) entail pi(b,di). Then, the statement follows from Proposition 1. □

Theorem 1 logically grounds data interlinking with in-keys and ontology alignments: if we know that the properties p1,…,pk are an in-key for a class C in O, and that, according to an alignment A, C subsumes a class D of O′ and p1,…,pk pairwise subsume properties q1,…,qk of O′, then, for every pair of instances a of C and b of D, the key will generate a same-as link between a and b if, for all i∈{1,…,k}, a has for pi a value ci which is equal to a value di that b has for qi.

Theorem 2 provides the logical basis of data interlinking with eq-keys and alignments. Note that unlike Theorem 1, p1,…,pk are required to be pairwise equivalent to q1,…,qk. Moreover, in order to generate a same-as link between a and b, we need to know all the values that a and b may have for pi and qi, respectively, and that these values are the same. This local completeness is expressed as axioms in the ontology schemas S and S′.

Theorem 2.

Let O=⟨S,D,K⟩ and O′=⟨S′,D′,K′⟩ be two ontologies and A=⟨C,L⟩ an alignment between O and O′ such that

– ({p1,…,pk}keyeqC)∈K, and
– {C⊒D}∪{pi≡qi}i=1k⊆C.

Then, for any pair of individual names a and b of O and O′, respectively, if

– {C(a)}∪⋃i=1k{pi(a,cij)}j=1ri⊆D,
– {{a}⊑∀pi.{ci1,…,ciri}}i=1k⊆S,
– {D(b)}∪⋃i=1k{qi(b,dij)}j=1ri⊆D′,
– {{b}⊑∀qi.{di1,…,diri}}i=1k⊆S′, and
– ⋃i=1k{cij≈dij}j=1ri⊆L,

then O, O′, A⊧a≈b.

Proof.

Notice that C⊒D and D(b) entail C(b), and that pi≡qi entails pi⊒qi which, together with qi(b,dij), entails pi(b,dij). Also, pi≡qi entails pi⊑qi which, along with {b}⊑∀qi.{di1,…,diri}, entails {b}⊑∀pi.{di1,…,diri}. Then, the statement follows from Proposition 2. □

Notice that in both theorems we only address the case when property values are individuals, i.e. when keys are composed of object properties only. The case when property values are literals, i.e. keys with datatype properties, does not make a difference for our purpose (although, in this case, the comparison of property values is based on equality and not on an initial set of known same-as links L). Also, the case when instances are related to the same individual name – e.g. p(a,c), q(b,c) – is covered by the theorem too, as it is enough to add links of type c≈c to L.

Another remark on Theorems 1 and 2 is that only one key of O and no key of O′ is needed to infer links. Actually, under the assumptions of the theorems, by Proposition 5, {q}i=1k is guaranteed to be an in-key (in Theorem 1) or an eq-key (Theorem 2) for the class D.

Even though Theorems 1 and 2 are not difficult to prove, they highlight some peculiarities of data interlinking with keys and alignments that have not received attention in the literature: the fact that equivalence of properties is not required for interlinking with in-keys, and that local completeness is necessary for interlinking with eq-keys.

It is possible to provide semantic versions of Theorems 1 and 2 in which the antecedent axioms are not asserted in the ontologies and alignments but inferred from them (e.g. O,O′,A⊧({p1,…,pk}keyinC) instead of ({p1,…,pk}keyinC)∈K). We have decided to present the asserted versions to stress the nature of every axiom (mapping, data, schema knowledge or links).

We finish this section with the definition of the link set generated by a key.

Definition 6

Definition 6(Link set generated by a key).

Let O and O′ be two ontologies. Let A be an alignment between O and O′. Let κ be a key. The set of links between O and O′ generated by κ under A is defined as

LκO,O′,A={a≈b:ais an individual in O∧bis an individual in O′∧O,O′,A,κ⊧a≈b∧O,O′,A⊭a≈b}

In the following sections, we will introduce link keys and formalise data interlinking with link keys in the same manner. We will then show that data interlinking with link keys is more general than data interlinking with keys and alignments.

6.Link keys

We define three different types of link keys: weak, plain and strong link keys. They all allow to find links between two datasets, but they differ on whether they allow the existence of different resources (duplicates) satisfying the key conditions within each of the datasets: weak link keys allow them; plain link keys allow them only among the non-linked resources; strong link keys disallow them all.

This distinction provides us with the right framework in which to compare link keys and keys and alignments: keys and alignments correspond to strong link keys (Theorem 5), though data interlinking may be possible with weak link keys when strong link keys do not exist (Theorem 6). Plain link keys fill the gap between weak and strong link keys.

The distinction between weak, plain and strong link keys is important in practice too: weak link keys can be used for data interlinking; strong link keys can be used for both data interlinking and duplicate detection, i.e. for discovering same-as statements between individuals of the same dataset; plain link keys lie between weak and plain link keys, as they can be used for data interlinking and for finding duplicates among the linked individuals.

6.1.Semantics of link keys

The semantics of link keys considered in [19] is reproduced in Definition 7. It is natural to extend this semantics to eq-keys too, and we do so in Definition 8. These kinds of link keys will be referred to as weak link keys.

Definition 7

Definition 7(Weak in-link key).

A weak in-link key assertion, or simply a weak in-link key, has the form

({⟨p1,q1⟩,…,⟨pk,qk⟩}linkkeyinw⟨C,D⟩)

where p1,…,pk and q1,…,qk are properties and C and D are classes.

An interpretation I satisfies ({⟨p1,q1⟩,…,⟨pk,qk⟩}linkkeyinw⟨C,D⟩) if, for any δ∈CI and η∈DI,

p1I(δ)∩q1I(η)≠∅,…,pkI(δ)∩qkI(η)≠∅impliesδ=η.

Note that the above definition does not specify to which ontology vocabulary the classes and properties of a link key belong. In practice, though, the classes C and D, and the properties {pi}i=1k and {qi}i=1k belong to different ontology schemas, and the instances of C and D to different datasets. This will become explicit in Section 7 when we formalise data interlinking with link keys. In addition, note that the definition does not say that the properties and classes of a link key are semantically aligned, neither via an equivalence relation nor via a subsumption relation.

Weak eq-link keys are defined below.

Definition 8

Definition 8(Weak eq-link key).

A weak eq-link key assertion, or simply a weak eq-link key, has the form

({⟨p1,q1⟩,…,⟨pk,qk⟩}linkkeyeqw⟨C,D⟩)

where p1,…,pk and q1,…,qk are properties and C and D are classes.

An interpretation I satisfies ({⟨p1,q1⟩,…,⟨pk,qk⟩}linkkeyinw⟨C,D⟩) if, for any δ∈CI and η∈DI,

p1I(δ)=q1I(η)≠∅,…,pkI(δ)=qkI(η)≠∅impliesδ=η.

It is worth noting that every key can be expressed as a link key. Indeed, ({p1,…,pk}keyxC) is equivalent to ({⟨p1,p1⟩,…,⟨pk,pk⟩}linkkeyxw⟨C,C⟩), with x∈{in,eq}.

Weak link keys are called weak because they are not necessarily composed of keys. Instead, strong link keys, introduced below, always embed two keys. For this reason, they are closely related to keys and alignments, as we formally state in Theorem 5. We only give the definition of strong in-link keys, as strong eq-link keys can be defined analogously.

Definition 9

Definition 9(Strong in-link key).

A strong in-link key assertion, or simply a strong in-link key, has the form

({⟨p1,q1⟩,…,⟨pk,qk⟩}linkkeyins⟨C,D⟩)

where p1,…,pk and q1,…,qk are properties and C and D are classes.

An interpretation I satisfies ({⟨p1,q1⟩,…,⟨pk,qk⟩}linkkeyins⟨C,D⟩) if

1. I⊧({⟨p1,q1⟩,…,⟨pk,qk⟩}linkkeyinw⟨C,D⟩)
2. I⊧({p1,…,pk}keyinC)
3. I⊧({q1,…,qk}keyinD)

Both strong and weak link keys enable to find links between two different datasets, but strong link keys do more. Indeed, since the properties of a strong link key are keys for the classes separately then they can be used for finding same-as statements between individuals of the same dataset, i.e. for identifying duplicates.

Fig. 1.

Two datasets and links generated depending on the type of link keys (double = weak, dashed = plain, waved = strong).

Finally, we introduce plain link keys, which are intermediate between weak and strong link keys. Plain link keys allow to find links and to identify duplicates of the instances that are linked. As before, we only give the definition of a plain in-link key, since plain eq-link keys are defined analogously.

Definition 10

Definition 10(Plain in-link key).

A plain in-link key assertion, or simply a plain in-link key, has the form

({⟨p1,q1⟩,…,⟨pk,qk⟩}linkkeyinp⟨C,D⟩)

where p1,…,pk and q1,…,qk are properties and C and D are classes.

An interpretation I satisfies ({⟨p1,q1⟩,…,⟨pk,qk⟩}linkkeyinp⟨C,D⟩) if, for any δ∈CI and η∈DI,

p1I(δ)∩q1I(η)≠∅,…,pkI(δ)∩qkI(η)≠∅

implies:

1. δ=η,
2. for any δ′∈CI, p1I(δ)∩p1I(δ′)≠∅, …,pkI(δ)∩pkI(δ′)≠∅ implies δ=δ′,
3. for any η′∈DI,
q1I(η)∩q1I(η′)≠∅, …,qkI(η)∩qkI(η′)≠∅ implies η=η′.

Figure 1 shows the differences between weak, plain and strong link keys on two datasets D and D′:

D={p(a1,c1),p(a1,c2),C(a1),p(a2,c2),C(a2),C(a3),p(a3,c3),C(a4),p(a4,c3)}D′={q(b1,d1),q(b1,d2),D(b1),q(b2,d2),D(b2),D(b3),q(b3,d3),D(b4),q(b4,d3)}

with the initial set of links L={c1≈d1}. Consider the in-link key ({⟨p,q⟩}linkkeyiny⟨C,D⟩). Depending on the value of y, it will generate:

weak:	a1≈b1 (double-line arrow),
plain:	plus a1≈a2 and b1≈b2 (dashed arrows),
strong:	plus a3≈a4 and b3≈b4 (wave arrows).

The question may be raised whether it is justified to link resources by key-like conditions when these conditions are not considered as keys (as in weak and plain link keys). This is for the same reason that in some context it may be enough to call people by their first name (e.g. in a nuclear family), in some another to call them by their last name (e.g. in a student class), and yet in other contexts, first name and last name may not be sufficient and birth date and birth place have to be used too (e.g. in a country). Here, the context is provided by what belongs to both datasets. It is not necessary that such a link key exists (there may be two students with the same last name in a class), but sometimes it does. In such cases, there would be no reason to prevent using it.

Fig. 2.

Fragments of the Insee and IGN ontologies and their alignment.

As it was done for keys in Definition 5, it is possible to define hybrid weak, plain and strong link keys by bringing together the in- and eq-conditions:

({⟨pi,qi⟩}i=1k{⟨rj,sj⟩}j=1llinkkeyy⟨C,D⟩)

with y∈{w,p,s}.

Alignments may be naturally extended to include a set of link keys. From here on, given two ontologies O and O′ equipped with keys, an alignment A between O and O′ will be a triple A=⟨C,L,LK⟩ which, in addition to a set of class and property correspondences C and a link set L, has a set LK of link keys between the vocabularies of O and O′ as a third component.

Below we give examples of link keys in real datasets.

Example 2

Example 2(Insee-IGN).

The Insee dataset includes links to the IGN dataset (French National Geographic Institute).5 5 There exist owl:sameAs links between the resources representing the French communes, arrondissements, departments and regions, gathered together in the two datasets using the same class names. These links can be found by comparing the Insee codes, which are declared in both datasets – using the ins:codeINSEE property in the Insee dataset and ign:numInsee in the IGN dataset.6 6 The considered fragment of the IGN ontology is depicted in Fig. 2.

We have checked the different link key conditions for the property pair ⟨ins:codeINSEE,ign:numInsee⟩ on the union of the Insee and IGN datasets taking into account the existing owl:sameAs links. They are strong in-link keys for ⟨ins:Com,ign:Com⟩, ⟨ins:Arr,ign:Arr⟩, ⟨ins:Dép,ign:Dép⟩ and ⟨ins:Rég,ign:Rég⟩. Formally:

I∗⊧({⟨ins:codeINSEE,ign:numInsee⟩}linkkeyins⟨ins:Com,ign:Com⟩)I∗⊧({⟨ins:codeINSEE,ign:numInsee⟩}linkkeyins⟨ins:Arr,ign:Arr⟩)I∗⊧({⟨ins:codeINSEE,ign:numInsee⟩}linkkeyins⟨ins:Dép,ign:Dép⟩)I∗⊧({⟨ins:codeINSEE,ign:numInsee⟩}linkkeyins⟨ins:Rég,ign:Rég⟩)

where I∗ is a canonical interpretation of the RDF graph resulting from the union of the Insee and IGN datasets whose linked individuals are merged.

Let us consider the other properties of Example 1. The property rdfs:label is used in the IGN dataset in the same way as ins:nom is used in the Insee dataset. Instead of ins:subdivisionDe, however, IGN uses the three properties ign:arr, ign:dpt and ign:region to declare the arrondissement, department and region an administrative unit belongs to. We have checked the different link key conditions for the combinations of these properties in the scope of the class pairs ⟨ins:Com,ign:Com⟩, ⟨ins:Arr,ign:Arr⟩, ⟨ins:Dép,ign:Dép⟩ and ⟨ins:Rég,ign:Rég⟩. This has been performed in the graph resulting from the union of the Insee graph, extended by transitivity of subdivisionDe, and the IGN graph, and again considering the owl:sameAs links. This generalises to the fully inferred RDF graph, as no other axiom of neither the Insee ontology nor the IGN ontology may have an impact on the satisfiability of the examined link key axioms. As one would expect, the property pair ⟨ins:nom,rdfs:label⟩ is a strong in-link key for ⟨ins:Dép,ign:Dép⟩ and ⟨ins:Rég,ign:Rég⟩. The property pairs ⟨ins:subdivisionDe,ign:arr⟩ and ⟨ins:subdivisionDe,ign:dpt⟩ with ⟨ins:nom,rdfs:label⟩ constitute weak (and plain) in-link keys for the class pairs ⟨ins:Com,ign:Com⟩ and ⟨ins:Arr,ign:Arr⟩, respectively. They are not strong link keys because, as explained in Example 1, subdivisionDe must be used as an eq-key. They are not eq-link keys either because ign:arr (as well as ign:dpt) refers to a single administrative unit, though subdivisionDe refers to several administrative units due to transitivity. Formally:

I∗⊧({⟨ins:nom,rdfs:label⟩,⟨ins:subdivisionDe,ign:arr⟩}linkkeyinw⟨ins:Com,ign:Com⟩)I∗⊧({⟨ins:nom,rdfs:label⟩,⟨ins:subdivisionDe,ign:dpt⟩}linkkeyinw⟨ins:Arr,ign:Arr⟩)I∗⊧({⟨ins:nom,rdfs:label⟩}linkkeyins⟨ins:Dép,ign:Dép⟩)I∗⊧({⟨ins:nom,rdfs:label⟩}linkkeyins⟨ins:Rég,ign:Rég⟩)

where I∗ is a canonical interpretation of the aforesaid RDF graph whose linked individuals are merged.

Obviously, the above link keys could be used for rediscovering the links.

At present, there exist tools for discovering weak in-link keys [7] and hybrid weak link keys [8].

6.2.Relations between different link keys

Below, we provide theoretical results stating the relations between the different kinds of link keys. Propositions 6 and 7 are the counterparts of Propositions 3 and 4 for link keys and can be proven similarly.

Proposition 6.

The following holds:

({⟨pi,qi⟩}i=1klinkkeyiny⟨C,D⟩)⊧({⟨pi,qi⟩}i=1klinkkeyeqy⟨C,D⟩)

with y∈{w,p,s}.

Proposition 7.

If p1,…,pk and q1,…,qk are functional then

({⟨pi,qi⟩}i=1klinkkeyeqy⟨C,D⟩)⊧({⟨pi,qi⟩}i=1klinkkeyiny⟨C,D⟩)

with y∈{w,p,s}.

Proposition 8 shows the relations between weak link keys, plain link keys and strong link keys: a strong link key is always a plain link key, which is always a weak link key. Interestingly, there is no distinction between weak eq-link keys and plain eq-link keys. This is due to the transitivity of equality.

Proposition 8.

The following holds:

({⟨pi,qi⟩}i=1klinkkeyxs⟨C,D⟩)⊧({⟨pi,qi⟩}i=1klinkkeyxp⟨C,D⟩)({⟨pi,qi⟩}i=1klinkkeyxp⟨C,D⟩)⊧({⟨pi,qi⟩}i=1klinkkeyxw⟨C,D⟩)({⟨pi,qi⟩}i=1klinkkeyeqw⟨C,D⟩)⊧({⟨pi,qi⟩}i=1klinkkeyeqp⟨C,D⟩)

with x∈{in,eq}.

Proof.

The first two propositions follow directly from the definitions of link keys. We prove the validity of the third one. Let I be a DL interpretation such that I⊧({⟨pi,qi⟩}i=1klinkkeyeqw⟨C,D⟩), and let us prove that I⊧({⟨pi,qi⟩}i=1klinkkeyeqp⟨C,D⟩). Let δ∈CI and η∈DI be such that piI(δ)=qiI(η)≠∅ (i=1,…,k). Since I⊧({⟨pi,qi⟩}i=1klinkkeyeqw⟨C,D⟩), then δ=η. Now, let δ′∈CI with piI(δ)=piI(δ′)≠∅ (i=1,…,k). From piI(δ)=qiI(η)≠∅ and piI(δ)=piI(δ′)≠∅, we can infer that piI(δ′)=qiI(η)≠∅ (i=1,…,k). This together with δ′∈CI, η∈DI and I⊧({⟨pi,qi⟩}i=1klinkkeyeqw⟨C,D⟩) implies δ′=η, and, since δ=η, then δ=δ′. The last condition of plain eq-link keys can be proven analogously. □

In the following section, we establish when it is legitimate to use link keys for data interlinking.

7.Data interlinking with link keys

Theorems 3 and 4 give the logical foundations of data interlinking with weak in-link keys and eq-link keys, respectively. Their proofs follow the same ideas as the proofs of Theorems 1 and 2 and are omitted.

Theorem 3.

Let O=⟨S,D,K⟩ and O′=⟨S′,D′,K′⟩ be two ontologies. Let A=⟨C,L,LK⟩ be an alignment between O and O′ such that

– ({⟨p1,q1⟩,…,⟨pk,qk⟩}linkkeyinw⟨C,D⟩)∈LK.

Then, for any pair of individual names a and b of O and O′, respectively, if

– {C(a)}∪{pi(a,ci)}i=1k⊆D,
– {D(b)}∪{qi(b,di)}i=1k⊆D′, and
– {ci≈di}i=1k⊆L

then O, O′, A⊧a≈b.

Theorem 3 provides the logical basis of data interlinking with weak in-link keys: if we know that the property pairs ⟨p1,q1⟩, …, ⟨pk,qk⟩ are a weak in-link key for the class pair ⟨C,D⟩, then, for every pair of instances a of C and b of D, the link key will generate a same-as link between a and b if, for every i∈{1,…,k}, a has for pi a value ci which is equal to a value di that b has for qi.

The counterpart of Theorem 3 for weak eq-link keys is Theorem 4. In this case, to generate a same-as link between a and b, we need to know all the values that a and b have for pi and qi, respectively, and that these values are the same. This local completeness is expressed as axioms in the ontology schemas S and S′.

Theorem 4.

Let O=⟨S,D,K⟩ and O′=⟨S′,D′,K′⟩ be two ontologies. Let A=⟨C,L,LK⟩ be an alignment between O and O′ such that

– ({⟨p1,q1⟩,…,⟨pk,qk⟩}linkkeyeqw⟨C,D⟩)∈LK.

Then, for any pair of individual names a and b of O and O′, respectively, if

– {C(a)}∪⋃i=1k{pi(a,cij)}j=1ri⊆D,
– {{a}⊑∀pi.{ci1,…,ciri}}i=1k⊆S,
– {D(b)}∪⋃i=1k{qi(b,dij)}j=1ri⊆D′,
– {{b}⊑∀qi.{di1,…,diri}}i=1k⊆S′, and
– ⋃i=1k{cij≈dij}j=1ri⊆L

then O, O′, A⊧a≈b.

Theorems 3 and 4 show that, unlike keys, weak link keys do not need mappings between classes and properties to perform data interlinking. In addition, since, by Proposition 8, any plain or strong link key is a weak link key, Theorems 3 and 4 hold for strong and plain link keys too.

Below we give the definition of the link set generated by a link key. It applies to all types of link keys.

Definition 11

Definition 11(Link set generated by a link key).

Let O and O′ be two ontologies. Let A be an alignment between O and O′. Let λ be a link key. The set of links between O and O′ generated by λ under A is

LλO,O′,A={a≈b:ais an individual in O∧bis an individual in O′∧O,O′,A,λ⊧a≈b∧O,O′,A⊭a≈b}

Strong link keys generate more equality statements than plain link keys, which generate more than weak link keys. Logically speaking, it is justified by the fact that the more constraining a link key is, the less models it has, and, thus, the more logical consequences follow from it. Dually, when searching for link keys, it will be easier to search for weak link keys than plain link keys, which will be easier than searching for strong link keys. This is because, in each case, more constraints need to be satisfied.

Therefore, the manipulation of link keys is delicate: the stronger a link key is, the more difficult to extract it, but the more equality statements it will generate. Furthermore, weak link keys may exist when plain and strong link keys do not. In such cases, data interlinking will only be possible with weak link keys. In contrast, duplicate detection inside datasets is only possible with plain or strong link keys.

The generation of links from a link key based on Theorems 3 and 4 is reasonably easy. It may be achieved by using specific tools such as Linkex [1], by transforming link keys into SPARQL queries (available from the Alignment API [15]) or by expressing them as (boolean) linkage rules to be executed in specific platforms such as SILK [22,33] or LIMES [26].

However, these latter platforms do not seem to support the comparison of sets of property values, thus the direct translation of eq-link keys is not possible. The natural extension of this approach, taking into account full reasoning, will require developing specific provers.

Now that we have formally defined how to interlink data with keys and link keys independently of each other, we are in position to compare them.

8.Relation between keys and link keys

Keys and link keys are data interlinking devices that we have developed so far in a parallel manner. One then may expect that their application always results in the generation of the same links. We are now able to formally establish the relation between keys and link keys, and to show that, although there may be data interlinking scenarios in which they will return the same links, this will not always be the case.

This section starts by studying the relation between keys and link keys as description logic axioms (Section 8.1). Theorem 5 states the correspondence between strong link keys and keys and alignments. This correspondence no longer holds for weak link keys (Theorem 6). We also study the impact of these results on the generation of links (Section 8.2): Theorems 7 and 8 show that the links generated by a strong link key are the same as the links generated by its corresponding keys and proper alignments. There are cases, though, in which it is possible to generate links with weak link keys while it is not possible with keys and alignments.

8.1.Logical relations between keys and link keys

The theorems presented here are consequences of stronger results included in the Appendix. We have decided to not include the latter in this section because the former are more directly related to data interlinking with keys and link keys.

Theorem 5 states the correspondence between strong link keys and keys and alignments: (11) says that strong link keys entail keys; (12) and (13) express conditions under which the converse of (11) holds.

Theorem 5.

The following holds:

(11)({⟨pi,qi⟩}i=1klinkkeyxs⟨C,D⟩)⊧({pi}i=1kkeyxC)(12)({pi}i=1kkeyinC),C⊒D,{pi⊒qi}i=1k⊧({⟨pi,qi⟩}i=1klinkkeyins⟨C,D⟩)(13)({pi}i=1kkeyeqC),C⊒D,{pi≡qi}i=1k⊧({⟨pi,qi⟩}i=1klinkkeyeqs⟨C,D⟩)

with x∈{in,eq}.

Proof.

(11) is a direct consequence of the definition of strong link keys (Definition 9). (12) and (13) are consequences of Proposition 12 in the Appendix. □

Given the symmetry of the link key definitions, (11), (12) and (13) hold for the right-hand side of the link key too (with reversed subsumption relations).

Theorem 5 states that it is possible to infer keys from strong link keys. This is not surprising because strong link keys are composed of keys by definition. We call these keys the side keys associated with a strong link key. More interestingly, Theorem 5 also states that strong link keys can be inferred from keys and proper alignments. Note that one key is enough to entail the strong link key as long as the alignment holds (these alignments are different depending on whether in-link keys or eq-link keys are considered).

The converses of (12) and (13) are only partly true: strong link keys entail keys, but strong link keys (nor plain or weak link keys) do not necessarily entail an alignment between their properties and classes. This rejects the idea that link keys embed alignments. Link keys do not assert alignments, but express conditions for identifying individuals. A link key between two classes C and D does not assert that C and D are equivalent, nor that one of the classes subsumes the other, it just specifies how to link individuals that are described as instances of C and D, but there may be individuals in both classes that do not belong to the other class. For example, there may exist a link key between the classes AdministrativeCentre and Town, although no equivalence, nor subsumption holds between them (some administrative centres are towns, others are cities; some towns are administrative centres, others not).

Is Theorem 5 still valid for weak and plain link keys? (12) and (13) do hold, but (11) does not. In other words: keys and proper alignments entail weak and plain link keys (Corollary 5.1); however, none of the side components of weak or plain link keys are necessarily keys (Theorem 6).

Fig. 3.

Two ontologies O1 and O2 such that O1∪O2∪{({⟨p,q⟩,⟨p′,q′⟩}linkkeyinw⟨C,D⟩)} is consistent, whereas O1∪{({p,p′}keyinC)} and O2∪{({q,q′}keyinD)} are inconsistent.

Corollary 5.1.

The following holds:

(14)({pi}i=1kkeyinC),C⊒D,{pi⊒qi}i=1k⊧({⟨pi,qi⟩}i=1klinkkeyiny⟨C,D⟩)(15)({pi}i=1kkeyeqC),C⊒D,{pi≡qi}i=1k⊧({⟨pi,qi⟩}i=1klinkkeyeqy⟨C,D⟩)

with y∈{w,p}.

Proof.

This is a direct consequence of Theorem 5 since, by Proposition 8, any strong link key is also a plain and a weak link key. □

Unlike strong link keys, none of the side components of weak or plain link keys are necessarily keys. The proof of Theorem 6 provides two ontologies that are consistent with a weak link key but are inconsistent with any of its side components.

Theorem 6.

There exist ontologies that are consistent with a weak link key but inconsistent with each of its side components.

Proof.

Consider the two ontologies O1 and O2 depicted in Fig. 3. It can be checked that

λ=({⟨p,q⟩,⟨p′,q′⟩}linkkeyinw⟨C,D⟩)

is consistent with O1∪O2. Notice that λ together with O1 and O2 entails the link a1≈b1.

On the contrary, the side components of λ, i.e.

κ1=({p,p′}keyinC)κ2=({q,q′}keyinD)

are inconsistent with O1 and O2, respectively. Indeed, O1∪{κ1}⊧a2≈a4 because a2 and a4 share the value v2 for p and the value w1 for p′. However, O1∪{κ1}⊧a2≉a4 because a2≉a4 belongs to O1. This means that O1∪{κ1} is inconsistent. Similarly, it can be shown that O2∪{κ2} is inconsistent. □

It is noteworthy that not a single useful key (i.e. a key that can be used to generate links) can be found in the ontologies of the proof of Theorem 6: ({p}keyinC) and ({p′}keyinC) are both inconsistent with O1, and ({q}keyinD) and ({q′}keyinD) with O2. As a consequence, in this example, data interlinking is possible with link keys (λ allows to find a1≈b1) but not with keys. Moreover, data interlinking is possible with weak link keys but not with strong link keys.

Example 3 makes it clear in the context of a real data interlinking scenario that (11) in Theorem 5 does not hold for weak link keys.

Example 3

Example 3(Insee-IGN (cont.)).

The following statement of Example 2:

I∗⊧({⟨ins:nom,rdfs:label⟩,⟨ins:subdivisionDe,ign:arr⟩}linkkeyinw⟨ins:Com,ign:Com⟩)

expresses a weak in-link key satisfied by I∗, the canonical interpretation of the RDF graphs of Example 2 whose linked individuals are merged.

Let us consider the side components of the above weak link key: If (11) of Theorem 5 were true for weak link keys, then the following two keys would be satisfied by I∗:

({ins:nom,ins:subdivisionDe}keyinins:Com)({rdfs:label,ign:arr}keyinign:Com)

However, as explained in Example 2, the key axiom ({ins:nom,ins:subdivisionDe}keyinins:Com) is not satisfied by I∗ due to the transitivity of the property ins:subdivisionDe.

One may think that data interlinking is still possible with ({rdfs:label,ign:arr}keyinign:Com), which is indeed satisfied by I∗. This would require the following property correspondences to hold

ins:nom⊒rdfs:labelins:subdivisionDe⊒ign:arr

However, I∗ does not satisfy ins:nom⊒rdfs:label but the reversed subsumption ins:nom⊑rdfs:label (see Fig. 2).

Even though the side components of a weak link key are not necessarily keys for the ontologies separately, every weak link key entails one key in the vocabulary of the ontologies together, as stated by Proposition 9 below. Unfortunately, this link key is of very limited use in practice because the inferred key holds for the intersection of the classes that we actually want to interlink (it is not known before linking which individuals belong to both classes).

Proposition 9.

The following holds:

({⟨pi,qi⟩}i=1klinkkeyxw⟨C,D⟩)⊧({pi⊓qi}i=1kkeyxwC⊓D)

with x∈{in,eq}.

Proof.

This is a consequence of Proposition 10 in the Appendix. □

8.2.Relations between generated link sets

The difference between using link keys for data interlinking instead of keys and ontology alignments becomes evident when comparing Theorem 1 with Theorem 3 and Theorem 2 with Theorem 4. In both cases, knowledge about keys and alignments is replaced by knowledge about link keys. Theorem 7 shows that the generated link sets are exactly the same.

Theorem 7.

Let O and O′ be ontologies. Let A=⟨C,L,LK⟩ be an alignment between O and O′ with {C⊒D},{pi⊑qi}i=1k⊆C. Let κ=({pi}i=1kkeyinC) and λ=({⟨pi,qi⟩}i=1klinkkeyins⟨C,D⟩). Then, it holds that LκO,O′,A=LλO,O′,A.

Proof.

The result follows from Definitions 6 and 11 and the fact that, since {C⊒D}∪{pi⊑qi}i=1k⊆C, by clause (12) of Theorem 5, we have O, O′, A, κ⊧λ and also O, O′, A, λ⊧κ. □

The same holds for eq-keys and eq-link keys.

Theorem 8.

Let O and O′ be ontologies. Let A=⟨C,L,LK⟩ be an alignment between O and O′ with {C⊒D},{pi≡qi}i=1k⊆C. Let κ=({pi}i=1kkeyeqC) and λ=({⟨pi,qi⟩}i=1klinkkeyeqs⟨C,D⟩). Then, it holds that LκO,O′,A=LλO,O′,A.

Proof.

The result follows from Definitions 6 and 11 and the fact that, since {C⊒D}∪{pi≡qi}i=1k⊆C, by clause (13) of Theorem 5, we have O, O′, A, κ⊧λ and also O, O′, A, λ⊧κ. □

The lesson from Theorems 7 and 8 is that, for interlinking two datasets, if there is a key for one dataset and a proper alignment from the key to the vocabulary of the other dataset, then using the key or the strong link key entailed by the key and the alignment is strictly equivalent.

However, as explained in the previous section, weak link keys may exist even when keys and proper alignments do not. As a conclusion, in general, link keys are more suitable than keys for data interlinking. Thus, data interlinking algorithms are justified in discovering link keys rather than keys and alignments. Below we provide a real data-interlinking scenario in which keys and alignments are not useful, but link keys are.

Example 4

Example 4(Insee-GeoNames).

GeoNames is a world-wide geographical database publicly available in RDF.7 7 Imagine that we are given the task of finding links between the URIs of Insee and GeoNames that represent French communes. Below we show that, for this particular task, keys and alignments are useless as they will generate no link, while link keys will generate almost all of them.

Insee’s ontology is very different from GeoNames’ ontology.8 8 It is not surprising as Insee’s scope is France and GeoNames’ is world-wide. GeoNames’ ontology basically contains only one class, gn:Feature, of which all geographical features (countries, cities, mountains, lakes, etc.) are direct instances. There is no named class equivalent to ins:Com, ins:Arr, ins:Dép or ins:Rég, but the following complex alignment holds:

ins:Com≡gn:Feature⊓∃gn:countryCode.{FR}⊓∃gn:featureCode.{A.ADM4}ins:Arr≡gn:Feature⊓∃gn:countryCode.{FR}⊓∃gn:featureCode.{A.ADM3}ins:Dép≡gn:Feature⊓∃gn:countryCode.{FR}⊓∃gn:featureCode.{A.ADM2}ins:Rég≡gn:Feature⊓∃gn:countryCode.{FR}⊓∃gn:featureCode.{A.ADM1}

From here on, the complex classes of the right-hand sides of the above equivalences will be denoted by gn:Com, gn:Arr, gn:Dep and gn:Reg.

Fig. 4.

Fragments of the Insee and GeoNames ontologies and their alignment.

In Insee, apart from rdf:type and owl:sameAs, communes only have the following properties: ins:nom, ins:subdivisionDe, ins:codeCommune and ins:codeInsee. Neither ins:codeCommune nor ins:codeInsee has any counterpart in GeoNames’ ontology, but ins:nom and ins:subdivisionDe are aligned in the following way:

(16)ins:nom≡gn:name(17)ins:subdivisionDe⊑gn:parentFeature

Certainly, gn:parentFeature is a multivalued property that relates features with their parents, in either administrative or physical subdivision. Therefore, ins:subdivisionDe, which only relates administrative divisions, is subsumed by gn:parentFeature but not equivalent to it.

Besides ins:subdivisionDe, one could also consider the GeoNames properties gn:parentCountry and gn:parentADMN – where gn:parentADMN refers to a level N administrative parent, N=1,2,3,4 – and the alignment

(18)ins:subdivisionDe⊒gn:parentCountry(19)ins:subdivisionDe⊒gn:parentADMN

However, (18) and (19) are not true because the properties gn:parentCountry and gn:parentADMN may be applied to non-administrative features (e.g. a lake), which is not the case of ins:subdivisionDe. The considered fragments of the Insee and GeoNames ontologies and their alignment are depicted in Fig. 4.

The following are the minimal keys that can be formed with the properties of the alignment made up of (16) and (17):

(20)({ins:nom}{ins:subdivisionDe}keyins:Com)(21)({gn:name}{gn:parentFeature}keygn:Com)

(20) cannot be used for interlinking because this requires ins:subdivisionDe and ins:parentFeature to be equivalent (Theorem 2), which is not true. This becomes apparent when we compare the values of French communes for both properties, as there are values that the communes have for ins:parentFeature but not for ins:subdivisionDe (e.g. the URI representing the European Union). Likewise, (21) is not useful.

From the above paragraph, we can conclude that keys and alignments are not useful for interlinking communes of Insee and GeoNames. Nevertheless, link keys are. More specifically, the following link keys can be used:

(22)({⟨ins:nom,gn:name⟩}{⟨ins:subdivisionDe,gn:parentADM3⟩}linkkeys⟨ins:Com,gn:Com⟩)(23)({⟨ins:nom,gn:name⟩}{⟨ins:subdivisionDe,gn:parentADM2⟩}linkkeys⟨ins:Arr,gn:Arr⟩)(24)({⟨ins:nom,gn:name⟩}linkkeyins⟨ins:Dép,gn:Dep⟩)

Indeed, the links between departments can be found using (24) by comparing their names, and, once these links are found, they can be used to find links between arrondissements using (23) by comparing their names and the departments they belong to. Finally, the found links between arrondissements can be used to find links between communes using (22) by comparing their names and the arrondissements they belong to. We did so and compared the results with a reference link set. We obtained 100% precision and 97% recall. The latter was due to the similarity function used to compare name strings.

To conclude, let us stress that, even though (19) is not true, (22) and (23) hold and are useful for interlinking. This confirms once again that the properties of a link key are not necessarily semantically related via subsumption or equivalence.

9.Conclusions and further work

The relation between keys and link keys is much more subtle than may be thought of at first sight, and one may not be replaced by the other without care. In particular, we have shown that data interlinking with keys requires (a) a proper alignment (Theorems 1 and 2), and (b) completion in the case of eq-keys (Theorem 2). Data interlinking with link keys, in turn, does not need alignments (Theorems 3 and 4) but still needs completion in the case of eq-link keys (Theorem 4).

Strong link keys entail keys by definition, and keys with proper alignments entail strong link keys (Theorem 5). In this case, the links generated by a strong link key are the same as those generated by their associated side keys and alignments (Theorems 7 and 8).

Nonetheless, in addition to not needing an alignment, weak link keys may exist independently from the existence of any key of the individual ontologies (Theorem 6; if they are, then they are strong link keys), and yet they may be useful for interlinking datasets.

These results provide a clear picture of the relationships between key-inspired devices available for data interlinking. They can be easily transferred to the hybrid keys and link keys.

The work presented in this paper contributes grounding data interlinking methods based on keys and link keys. In particular, it justifies the work for directly extracting weak link keys [7] instead of searching for keys with matching alignments. Link key extraction directly focuses on what may be used for data interlinking instead of generating keys and alignments that may not be possible to exploit. Also, when no strong link key exists, link key extraction may find a suitable weak link key, though key extraction will not return any useful key.

The clarification of the semantics of link keys tackled in this paper should lead to complement data interlinking methods with inference methods. One approach consists in designing rules, inspired by the statements found in propositions of this paper, to infer (link) keys from (link) keys in the same way as Armstrong’s axioms [5] allow to derive functional dependencies. However, this approach is highly dependent on the actual schema language used. Another approach extends description logic reasoners to include keys [23] and link keys as axioms. In both cases, entailed link keys could be exploited by extended versions of reasoning-based data interlinking tools. This should also enable breaking the extraction + interlinking process by reasoning on link keys before interlinking in order to provide more accurate links, eventually more efficiently.

Notes

1 Notice that “≈” is a symbol of the language, which is interpreted as equality. More specifically, for any DL interpretation I, I⊧a≈b iff aI=bI.

2 By piI(δ)=piI(δ′)≠∅ we mean piI(δ)=piI(δ′) and piI(δ′)≠∅, which implies piI(δ)≠∅ (i=1,…,k).

3 http://rdf.insee.fr

4 More specifically, this is the interpretation whose domain is made up of all IRIs and literals of the Insee graph (there are no blank nodes), it interprets domain individuals as themselves, and classes and properties as their extensions in the graph.

5 http://data.ign.fr

6 ign is bound to the namespace http://data.ign.fr/def/geofla#.

7 https://download.geonames.org/export/dump/allCountries.zip

8 http://www.geonames.org/ontology/ontology_v3.2.rdf

Acknowledgements

This work has been partially supported by the ANR project Elker (ANR-17-CE23-0007).

The authors acknowledge the numerous discussions they had with Michel Chein, Madalina Croitoru, Chan Le Duc, Michel Leclère, Nathalie Pernelle, Fatiha Saïs, François Scharffe and Danai Symeonidou on the various definitions of keys that may apply in logical formalisms.

Appendices

Appendix

AppendixProofs of Section 8.1

This appendix describes the relations between keys and link keys in a more precise way than it was done in Section 8.1. Some of the results of Section 8.1 are synthetic consequences of the ones presented here.

Proposition 10.

The following holds:

({⟨pi,qi⟩}i=1klinkkeyinw⟨C,D⟩),{pi⊑qi}i=1k⊧({pi}i=1kkeyinC⊓D)({⟨pi,qi⟩}i=1klinkkeyinw⟨C,D⟩),{pi⊒qi}i=1k⊧({qi}i=1kkeyinC⊓D)({⟨pi,qi⟩}i=1klinkkeyeqw⟨C,D⟩),{pi≡qi}i=1k⊧({pi}i=1kkeyeqC⊓D)

Proof.

Let us prove the first entailment. Let I be such that I⊧({⟨pi,qi⟩}i=1klinkkeyinw⟨C,D⟩) and I⊧pi⊑qi (i=1,…,k), and let us prove that I⊧({pi}i=1kkeyinC⊓D). Let δ,δ′∈(C⊓D)I such that piI(δ)∩piI(δ′)≠∅ (i=1,…,k). Since δ,δ′∈(C⊓D)I=CI∩DI then δ,δ′∈CI and δ,δ′∈DI. In particular, δ∈CI and δ′∈DI. Now, since I⊧pi⊑qi, then, piI(δ′)⊆qiI(δ′) (i=1,…,k). From this and the fact that piI(δ)∩piI(δ′)≠∅, we can infer that piI(δ)∩qiI(δ′)≠∅ (i=1,…,k). Since I⊧({⟨pi,qi⟩}i=1klinkkeyinw⟨C,D⟩) and δ∈CI and δ′∈DI, then δ=δ′. The second entailment can be proven analogously.

Let us prove the third entailment. Let I be such that I⊧({⟨pi,qi⟩}i=1klinkkeyeqw⟨C,D⟩) and I⊧pi≡qi (i=1,…,k), and let us prove that I satisfies the eq-key ({pi}i=1kkeyeqC⊓D). Let δ,δ′∈(C⊓D)I such that piI(δ)=piI(δ′)≠∅ (i=1,…,k). Since δ,δ′∈(C⊓D)I then δ∈CI and δ′∈DI. Now, since I⊧pi≡qi, then, we have piI(δ′)=qiI(δ′) (i=1,…,k). From this and the fact that piI(δ)=piI(δ′)≠∅, we can infer that piI(δ)=qiI(δ′)≠∅ (i=1,…,k). Finally, since δ∈CI and δ′∈DI and I⊧({⟨pi,qi⟩}i=1klinkkeyeqw⟨C,D⟩) then δ=δ′. □

Proposition 11 is the counterpart of Proposition 10 for strong link keys. Notice that this time the consequent is a key in the union of classes, and not only in the intersection.

Proposition 11.

The following holds:

({⟨pi,qi⟩}i=1klinkkeyins⟨C,D⟩),{pi⊑qi}i=1k⊧({pi}i=1kkeyinC⊔D)({⟨pi,qi⟩}i=1klinkkeyins⟨C,D⟩),{pi⊒qi}i=1k⊧({qi}i=1kkeyinC⊔D)({⟨pi,qi⟩}i=1klinkkeyeqs⟨C,D⟩),{pi≡qi}i=1k⊧({pi}i=1kkeyeqC⊔D)

Proof.

We only prove the first entailment. Let I such that I⊧({⟨pi,qi⟩}i=1klinkkeyins⟨C,D⟩) and I⊧pi⊑qi (i=1,…,k), and let us prove that I⊧({pi}i=1kkeyinC⊔D). Let δ,δ′∈(C⊔D)I such that piI(δ)∩piI(δ′)≠∅ (i=1,…,k). We have δ,δ′∈(C⊔D)I=CI∪DI. Let us consider three cases: (1) δ,δ′∈CI, (2) δ,δ′∈DI and (3) δ∈CI and δ′∈DI (the case δ′∈CI and δ∈DI is equivalent to this last one).

(1) Assume that δ,δ′∈CI. Since I satisfies the strong in-link key ({⟨pi,qi⟩}i=1klinkkeyins⟨C,D⟩) then I⊧({pi}i=1kkeyinC). From this and the fact that δ,δ′∈CI and piI(δ)∩piI(δ′)≠∅ (i=1,…,k), we can conclude that δ=δ′.

(2) Assume that δ,δ′∈DI. Since I satisfies the strong in-link key ({⟨pi,qi⟩}i=1klinkkeyins⟨C,D⟩) then I⊧({qi}i=1kkeyinD). Now, we also have that I⊧pi⊑qi. Thus, piI(δ)⊆qiI(δ) and piI(δ′)⊆qiI(δ′) (i=1,…,k). From this, and piI(δ)∩piI(δ′)≠∅, we can infer that qiI(δ)∩qiI(δ′)≠∅ (i=1,…,k). This along with the fact that δ,δ′∈DI and I⊧({qi}i=1kkeyinD) implies δ=δ′.

(3) Finally, assume that δ∈CI, δ′∈DI. Since I⊧({⟨pi,qi⟩}i=1klinkkeyins⟨C,D⟩) then I⊧({⟨pi,qi⟩}i=1klinkkeyinw⟨C,D⟩). It is possible to proceed like in the proof of the first statement of Proposition 10 to conclude that δ=δ′.

The other two statements can be proven similarly. □

Proposition 12 is the converse of Proposition 11. Notice, however, that, in the case of in-link keys, the subsumptions are inverted, i.e. they are the subsuming and not the subsumed properties the ones that must form an in-key in the union of classes.

Proposition 12.

The following holds:

({pi}i=1kkeyinC⊔D),{pi⊒qi}i=1k⊧({⟨pi,qi⟩}i=1klinkkeyins⟨C,D⟩)({qi}i=1kkeyinC⊔D),{pi⊑qi}i=1k⊧({⟨pi,qi⟩}i=1klinkkeyins⟨C,D⟩)({pi}i=1kkeyeqC⊔D),{pi≡qi}i=1k⊧({⟨pi,qi⟩}i=1klinkkeyeqs⟨C,D⟩)

Proof.

We only prove the first entailment. Let I be an interpretation such that I⊧({pi}i=1kkeyinC⊔D) and I⊧pi⊒qi (i=1,…k).

Since I⊧({pi}i=1kkeyinC⊔D), by (8) of Proposition 5, we have that I⊧({pi}i=1kkeyinC). Let us prove that I satisfies the key ({qi}i=1kkeyinD). Since I⊧({pi}i=1kkeyinC⊔D), by (8) of Proposition 5, we have I⊧({pi}i=1kkeyinD), and, since I⊧pi⊒qi, by (9) of Proposition 5, we also have that I⊧({qi}i=1kkeyinD).

Now, we prove I⊧({⟨pi,qi⟩}i=1klinkkeyinw⟨C,D⟩). Let δ∈CI and δ′∈DI with piI(δ)∩qiI(δ′)≠∅ (i=1,…,k). From δ∈CI and δ′∈DI we have δ,δ′∈CI∪DI=(C⊔D)I. Since I⊧pi⊒qi, we have qiI(δ′)⊆piI(δ′) (i=1,…,k). From this and piI(δ)∩qiI(δ′)≠∅ we infer piI(δ)∩piI(δ′)≠∅ (i=1,…,k). This together with δ,δ′∈(C⊔D)I and I⊧({pi}i=1kkeyinC⊔D) implies δ=δ′.

The second entailment can be proven analogously. The third entailment can be proven analogously too, but will use (10) of Proposition 5. □

References

[1]	N. Abbas, J. David and A. Napoli, Linkex: A tool for link key discovery based on pattern structures, in: Supplementary Proceedings of ICFCA 2019 Conference and Workshops, Frankfurt, Germany, June 25–28, 2019, D. Cristea, F.L. Ber, R. Missaoui, L. Kwuida and B. Sertkaya, eds, CEUR Workshop Proceedings, Vol. 2378: , CEUR-WS.org, (2019) , pp. 33–38.
[2]	M. Achichi, M.B. Ellefi, D. Symeonidou and K. Todorov, Automatic key selection for data linking, in: Knowledge Engineering and Knowledge Management – 20th International Conference, EKAW 2016, Bologna, Italy, November 19–23, 2016, Proceedings, E. Blomqvist, P. Ciancarini, F. Poggi and F. Vitali, eds, Lecture Notes in Computer Science, Vol. 10024: , Springer, (2016) , pp. 3–18. doi:10.1007/978-3-319-49004-5_1.
[3]	M. Al-Bakri, M. Atencia, J. David, S. Lalande and M. Rousset, Uncertainty-sensitive reasoning for inferring sameAs facts in linked data, in: ECAI 2016 – 22nd European Conference on Artificial Intelligence – Including Prestigious Applications of Artificial Intelligence (PAIS 2016), The Hague, the Netherlands, 29 August–2 September 2016, G.A. Kaminka, M. Fox, P. Bouquet, E. Hüllermeier, V. Dignum, F. Dignum and F. van Harmelen, eds, Frontiers in Artificial Intelligence and Applications, Vol. 285: , IOS Press, (2016) , pp. 698–706. doi:10.3233/978-1-61499-672-9-698.
[4]	M. Al-Bakri, M. Atencia, S. Lalande and M. Rousset, Inferring same-as facts from linked data: An iterative import-by-query approach, in: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, Texas, USA, January 25–30, 2015, B. Bonet and S. Koenig, eds, AAAI Press, (2015) , pp. 9–15.
[5]	W.W. Armstrong, Dependency structures of data base relationships, in: Information Processing, Proceedings of the 6th IFIP Congress 1974, Stockholm, Sweden, August 5–10, 1974, J.L. Rosenfeld, ed., North-Holland, (1974) , pp. 580–583.
[6]	M. Atencia, M. Chein, M. Croitoru, J. David, M. Leclère, N. Pernelle, F. Saïs, F. Scharffe and D. Symeonidou, Defining key semantics for the RDF datasets: Experiments and evaluations, in: Graph-Based Representation and Reasoning – 21st International Conference on Conceptual Structures, ICCS 2014, Iaşi, Romania, July 27–30, 2014, Proceedings, N. Hernandez, R. Jäschke and M. Croitoru, eds, Lecture Notes in Computer Science, Vol. 8577: , Springer, (2014) , pp. 65–78. doi:10.1007/978-3-319-08389-6_7.
[7]	M. Atencia, J. David and J. Euzenat, Data interlinking through robust linkkey extraction, in: ECAI 2014 – 21st European Conference on Artificial Intelligence – Including Prestigious Applications of Intelligent Systems (PAIS 2014), Prague, Czech Republic, 18–22 August 2014, T. Schaub, G. Friedrich and B. O’Sullivan, eds, Frontiers in Artificial Intelligence and Applications, Vol. 263: , IOS Press, (2014) , pp. 15–20. doi:10.3233/978-1-61499-419-0-15.
[8]	M. Atencia, J. David, J. Euzenat, A. Napoli and J. Vizzini, Link key candidate extraction with relational concept analysis, Discrete Applied Mathematics 273: ((2020) ), 2–20. doi:10.1016/j.dam.2019.02.012.
[9]	M. Atencia, J. David and F. Scharffe, Keys and pseudo-keys detection for web datasets cleansing and interlinking, in: Knowledge Engineering and Knowledge Management – 18th International Conference, EKAW 2012, Galway City, Ireland, October 8–12, 2012, Proceedings, A. ten Teije, J. Völker, S. Handschuh, H. Stuckenschmidt, M. d’Aquin, A. Nikolov, N. Aussenac-Gilles and N. Hernandez, eds, Lecture Notes in Computer Science, Vol. 7603: , Springer, (2012) , pp. 144–153. doi:10.1007/978-3-642-33876-2_14.
[10]	A. Borgida and L. Serafini, Distributed description logics: Assimilating information from peer sources, Journal on Data Semantics 1: ((2003) ), 153–184. doi:10.1007/978-3-540-39733-5_7.
[11]	A. Borgida and G. Weddell, Adding uniqueness constraints to description logics (preliminary report), in: Deductive and Object-Oriented Databases, 5th International Conference, DOOD ’97, Montreux, Switzerland, December 8–12, 1997, Proceedings, F. Bry, R. Ramakrishnan and K. Ramamohanarao, eds, Lecture Notes in Computer Science, Vol. 1341: , Springer, (1997) , pp. 85–102. doi:10.1007/3-540-63792-3_10.
[12]	D. Calvanese, G. De Giacomo and M. Lenzerini, Keys for free in description logics, in: Proceedings of the 2000 International Workshop on Description Logics (DL2000), Aachen, Germany, August 17–19, 2000, F. Baader and U. Sattler, eds, CEUR Workshop Proceedings, Vol. 33: , CEUR-WS.org, (2000) , pp. 79–88.
[13]	D. Calvanese, G. De Giacomo and M. Lenzerini, Identification constraints and functional dependencies in description logics, in: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, IJCAI 2001, Seattle, Washington, USA, August 4–10, 2001, B. Nebel, ed., Morgan Kaufmann, (2001) , pp. 155–160.
[14]	P. Christen, Data Matching – Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection, Data-Centric Systems and Applications, Springer, (2012) . doi:10.1007/978-3-642-31164-2.
[15]	J. David, J. Euzenat, F. Scharffe and C. Trojahn dos Santos, The alignment API 4.0, Semantic Web Journal 2: (1) ((2011) ), 3–10. doi:10.3233/SW-2011-0028.
[16]	J. Euzenat and P. Shvaiko, Ontology Matching, 2nd edn, Springer, (2013) .
[17]	H. Farah, D. Symeonidou and K. Todorov, KeyRanker: Automatic RDF key ranking for data linking, in: Proceedings of the Knowledge Capture Conference, K-CAP 2017, Austin, TX, USA, December 4–6, 2017, Ó. Corcho, K. Janowicz, G. Rizzo, I. Tiddi and D. Garijo, eds, ACM, New York, NY, USA, (2017) , pp. 7–178. doi:10.1145/3148011.3148023.
[18]	A. Ferrara, A. Nikolov and F. Scharffe, Data linking for the semantic web, International Journal of Semantic Web and Information Systems 7: (3) ((2011) ), 46–76. doi:10.4018/jswis.2011070103.
[19]	M. Gmati, M. Atencia and J. Euzenat, Tableau extensions for reasoning with link keys, in: Proceedings of the 11th International Workshop on Ontology Matching Co-Located with the 15th International Semantic Web Conference (ISWC 2016), Kobe, Japan, October 18, 2016, P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham, O. Hassanzadeh and R. Ichise, eds, CEUR Workshop Proceedings, Vol. 1766: , CEUR-WS.org, (2016) , pp. 37–48.
[20]	T. Heath and C. Bizer, Linked Data: Evolving the Web into a Global Data Space, Morgan and Claypool, (2011) . doi:10.2200/S00334ED1V01Y201102WBE001.
[21]	A. Hogan, A. Zimmermann, J. Umbrich, A. Polleres and S. Decker, Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora, Journal of Web Semantics 10: ((2012) ), 76–110. doi:10.1016/j.websem.2011.11.002.
[22]	R. Isele, A. Jentzsch and C. Bizer, Efficient multidimensional blocking for link discovery without losing recall, in: Proceedings of the 14th International Workshop on the Web and Databases 2011, WebDB 2011, Athens, Greece, June 12, 2011, A. Marian and V. Vassalos, eds, (2011) .
[23]	C. Lutz, C. Areces, I. Horrocks and U. Sattler, Keys, nominals, and concrete domains, Journal of Artificial Intelligence Research 23: ((2005) ), 667–726. doi:10.1613/jair.1542.
[24]	C. Lutz and M. Milicic, Description logics with concrete domains and functional dependencies, in: Proceedings of the 16th Eureopean Conference on Artificial Intelligence, ECAI ’2004, Including Prestigious Applicants of Intelligent Systems, PAIS 2004, Valencia, Spain, August 22–27, 2004, R.L. de Mántaras and L. Saitta, eds, IOS Press, (2004) , pp. 378–382.
[25]	M. Nentwig, M. Hartung, A.-C. Ngonga Ngomo and E. Rahm, A survey of current link discovery frameworks, Semantic Web 8: (3) ((2017) ), 419–436. doi:10.3233/SW-150210.
[26]	A.-C. Ngonga Ngomo and S. Auer, LIMES – A time-efficient approach for large-scale link discovery on the web of data, in: IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16–22, 2011, T. Walsh, ed., IJCAI/AAAI, (2011) , pp. 2312–2317. doi:10.5591/978-1-57735-516-8/IJCAI11-385.
[27]	A.-C. Ngonga Ngomo and K. Lyko, EAGLE: Efficient active learning of link specifications using genetic programming, in: The Semantic Web: Research and Applications – 9th Extended Semantic Web Conference, ESWC 2012, Heraklion, Crete, Greece, May 27–31, 2012, Proceedings, E. Simperl, P. Cimiano, A. Polleres, Ó. Corcho and V. Presutti, eds, Lecture Notes in Computer Science, Vol. 7295: , Springer, (2012) , pp. 149–163. doi:10.1007/978-3-642-30284-8_17.
[28]	S. Rudolph, Foundations of description logics, in: Reasoning Web. Semantic Technologies for the Web of Data – 7th International Summer School 2011, Galway, Ireland, August 23–27, 2011, Tutorial Lectures, A. Polleres, C. d’Amato, M. Arenas, S. Handschuh, P. Kroner, S. Ossowski and P.F. Patel-Schneider, eds, Lecture Notes in Computer Science, Vol. 6848: , Springer, (2011) , pp. 76–136. doi:10.1007/978-3-642-23032-5_2.
[29]	M.A. Sherif, A.-C. Ngonga Ngomo and J. Lehmann, Wombat – A generalization approach for automatic link discovery, in: The Semantic Web – 14th International Conference, ESWC 2017, Portorož, Slovenia, May 28–June 1, 2017, Proceedings, Part I, E. Blomqvist, D. Maynard, A. Gangemi, R. Hoekstra, P. Hitzler and O. Hartig, eds, Lecture Notes in Computer Science, Vol. 10249: , Springer, (2017) , pp. 103–119. doi:10.1007/978-3-319-58068-5_7.
[30]	T. Soru, E. Marx and A.-C. Ngonga Ngomo, ROCKER – A refinement operator for key discovery, in: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, May 18–22, 2015, A. Gangemi, S. Leonardi and A. Panconesi, eds, ACM, New York, NY, USA, (2015) , pp. 1025–1033. doi:10.1145/2736277.2741642.
[31]	D. Symeonidou, V. Armant, N. Pernelle and F. Saïs, SAKey: Scalable almost key discovery in RDF data, in: The Semantic Web – ISWC 2014 – 13th International Semantic Web Conference, P. Mika, T. Tudorache, A. Bernstein, C. Welty, C.A. Knoblock, D. Vrandecic, P. Groth, N.F. Noy, K. Janowicz and C.A. Goble, eds, Lecture Notes in Computer Science, Vol. 8796: , Springer, (2014) , pp. 33–49. doi:10.1007/978-3-319-11964-9_3.
[32]	D. Toman and G. Weddell, On keys and functional dependencies as first-class citizens in description logics, Journal of Automated Reasoning 40: (2–3) ((2008) ), 117–132. doi:10.1007/s10817-007-9092-z.
[33]	J. Volz, C. Bizer, M. Gaedke and G. Kobilarov, Discovering and maintaining links on the web of data, in: The Semantic Web – ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25–29, 2009, Proceedings, A. Bernstein, D.R. Karger, T. Heath, L. Feigenbaum, D. Maynard, E. Motta and K. Thirunarayan, eds, Lecture Notes in Computer Science, Vol. 5823: , Springer, (2009) , pp. 650–665. doi:10.1007/978-3-642-04930-9_41.
[34]	A. Zimmermann and J. Euzenat, Three semantics for distributed systems and their relations with alignment composition, in: The Semantic Web – ISWC 2006, 5th International Semantic Web Conference, ISWC 2006, Athens, GA, USA, November 5–9, 2006, Proceedings, I.F. Cruz, S. Decker, D. Allemang, C. Preist, D. Schwabe, P. Mika, M. Uschold and L. Aroyo, eds, Lecture Notes in Computer Science, Vol. 4273: , Springer, (2006) , pp. 16–29. doi:10.1007/11926078_2.