You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.

Go to header Go to navigation Go to search Go to contents Go to footer

In content section. Select this link to jump to navigation

Semantic Web - Volume Pre-press, issue Pre-press

Purchase individual online access for 1 year to this journal.

Price: EUR 170.00

ISSN 1570-0844 (P)
ISSN 2210-4968 (E)

Impact Factor 2023: 3

The journal Semantic Web – Interoperability, Usability, Applicability is an international and interdisciplinary journal bringing together researchers from various fields which share the vision and need for more effective and meaningful ways to share information across agents and services on the future Internet and elsewhere.

As such, Semantic Web technologies shall support the seamless integration of data, on-the-fly composition and interoperation of Web services, as well as more intuitive search engines. The semantics – or meaning – of information, however, cannot be defined without a context, which makes personalization, trust and provenance core topics for Semantic Web research.

New retrieval paradigms, user interfaces and visualization techniques have to unleash the power of the Semantic Web and at the same time hide its complexity from the user. Based on this vision, the journal welcomes contributions ranging from theoretical and foundational research over methods and tools to descriptions of concrete ontologies and applications in all areas. Papers which add a social, spatial and temporal dimension to Semantic Web research, as well as application-oriented papers making use of formal semantics, are especially welcome.

The journal is co-published by the Akademische Verlagsgesellschaft AKA.

Show:

results per page

Neural axiom network for knowledge graph reasoning

Authors: Li, Juan | Chen, Xiangnan | Yu, Hongtao | Chen, Jiaoyan | Zhang, Wen

Article Type: Research Article

Abstract: Knowledge graph reasoning (KGR) aims to infer new knowledge or detect noises, which is essential for improving the quality of knowledge graphs. Recently, various KGR techniques, such as symbolic- and embedding-based methods, have been proposed and shown strong reasoning ability. Symbolic-based reasoning methods infer missing triples according to predefined rules or ontologies. Although rules and axioms have proven effective, it is difficult to obtain them. Embedding-based reasoning methods represent entities and relations as vectors, and complete KGs via vector computation. However, they mainly rely on structural information and ignore implicit axiom information not predefined in KGs but can be reflected …in data. That is, each correct triple is also a logically consistent triple and satisfies all axioms. In this paper, we propose a novel NeuR al A xiom N etwork (NeuRAN ) framework that combines explicit structural and implicit axiom information without introducing additional ontologies. Specifically, the framework consists of a KG embedding module that preserves the semantics of triples and five axiom modules that encode five kinds of implicit axioms. These axioms correspond to five typical object property expression axioms defined in OWL2, including ObjectPropertyDomain , ObjectPropertyRange , DisjointObjectProperties , IrreflexiveObjectProperty and AsymmetricObjectProperty . The KG embedding module and axiom modules compute the scores that the triple conforms to the semantics and the corresponding axioms, respectively. Compared with KG embedding models and CKRL, our method achieves comparable performance on noise detection and triple classification and achieves significant performance on link prediction. Compared with TransE and TransH, our method improves the link prediction performance on the Hits@1 metric by 22.0% and 20.8% on WN18RR-10% dataset, respectively. Show more

Keywords: Knowledge graph reasoning, knowledge graph embedding, noise detection, triple classification, link prediction

DOI: 10.3233/SW-233276

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-16, 2023

Explanation Ontology: A general-purpose, semantic representation for supporting user-centered explanations

Authors: Chari, Shruthi | Seneviratne, Oshani | Ghalwash, Mohamed | Shirai, Sola | Gruen, Daniel M. | Meyer, Pablo | Chakraborty, Prithwish | McGuinness, Deborah L.

Article Type: Research Article

Abstract: In the past decade, trustworthy Artificial Intelligence (AI) has emerged as a focus for the AI community to ensure better adoption of AI models, and explainable AI is a cornerstone in this area. Over the years, the focus has shifted from building transparent AI methods to making recommendations on how to make black-box or opaque machine learning models and their results more understandable by experts and non-expert users. In our previous work, to address the goal of supporting user-centered explanations that make model recommendations more explainable, we developed an Explanation Ontology (EO). The EO is a general-purpose representation that was …designed to help system designers connect explanations to their underlying data and knowledge. This paper addresses the apparent need for improved interoperability to support a wider range of use cases. We expand the EO, mainly in the system attributes contributing to explanations, by introducing new classes and properties to support a broader range of state-of-the-art explainer models. We present the expanded ontology model, highlighting the classes and properties that are important to model a larger set of fifteen literature-backed explanation types that are supported within the expanded EO. We build on these explanation type descriptions to show how to utilize the EO model to represent explanations in five use cases spanning the domains of finance, food, and healthcare. We include competency questions that evaluate the EO’s capabilities to provide guidance for system designers on how to apply our ontology to their own use cases. This guidance includes allowing system designers to query the EO directly and providing them exemplar queries to explore content in the EO represented use cases. We have released this significantly expanded version of the Explanation Ontology at https://purl.org/heals/eo and updated our resource website, https://tetherless-world.github.io/explanation-ontology , with supporting documentation. Overall, through the EO model, we aim to help system designers be better informed about explanations and support these explanations that can be composed, given their systems’ outputs from various AI models, including a mix of machine learning, logical and explainer models, and different types of data and knowledge available to their systems. Show more

Keywords: Explainable AI, semantic representation of explanations, Explanation Ontology, modeling explanation types – AI method outputs and knowledge, supporting patterns for explanation types

DOI: 10.3233/SW-233282

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-31, 2023

Interpretable ontology extension in chemistry

Authors: Glauer, Martin | Memariani, Adel | Neuhaus, Fabian | Mossakowski, Till | Hastings, Janna

Article Type: Research Article

Abstract: Reference ontologies provide a shared vocabulary and knowledge resource for their domain. Manual construction and annotation enables them to maintain high quality, allowing them to be widely accepted across their community. However, the manual ontology development process does not scale for large domains. We present a new methodology for automatic ontology extension for domains in which the ontology classes have associated graph-structured annotations, and apply it to the ChEBI ontology, a prominent reference ontology for life sciences chemistry. We train Transformer-based deep learning models on the leaf node structures from the ChEBI ontology and the classes to which they …belong. The models are then able to automatically classify previously unseen chemical structures, resulting in automated ontology extension. The proposed models achieved an overall F1 scores of 0.80 and above, improvements of at least 6 percentage points over our previous results on the same dataset. In addition, the models are interpretable: we illustrate that visualizing the model’s attention weights can help to explain the results by providing insight into how the model made its decisions. We also analyse the performance for molecules that have not been part of the ontology and evaluate the logical correctness of the resulting extension. Show more

Keywords: Ontology extension, ontology learning, chemical ontology, Transformers, automated classification, transfer learning, multi-label classification

DOI: 10.3233/SW-233183

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-22, 2023

Data journeys: Explaining AI workflows through abstraction

Authors: Daga, Enrico | Groth, Paul

Article Type: Research Article

Abstract: Artificial intelligence systems are not simply built on a single dataset or trained model. Instead, they are made by complex data science workflows involving multiple datasets, models, preparation scripts, and algorithms. Given this complexity, in order to understand these AI systems, we need to provide explanations of their functioning at higher levels of abstraction. To tackle this problem, we focus on the extraction and representation of data journeys from these workflows. A data journey is a multi-layered semantic representation of data processing activity linked to data science code and assets. We propose an ontology to capture the essential elements …of a data journey and an approach to extract such data journeys. Using a corpus of Python notebooks from Kaggle, we show that we are able to capture high-level semantic data flow that is more compact than using the code structure itself. Furthermore, we show that introducing an intermediate knowledge graph representation outperforms models that rely only on the code itself. Finally, we report on a user survey to reflect on the challenges and opportunities presented by computational data journeys for explainable AI. Show more

Keywords: Data science analysis, XAI, transparency, explainability, data provenance, workflows

DOI: 10.3233/SW-233407

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-27, 2023

LOD4Culture: Easy exploration of cultural heritage linked open data

Authors: Vega-Gorgojo, Guillermo

Article Type: Research Article

Abstract: LOD4Culture is a web application that exploits Cultural Heritage Linked Open Data for tourism and education purposes. Since target users are not fluid on Semantic Web technologies, the user interface is designed to hide the intricacies of RDF or SPARQL. An interactive map is provided for exploring world-wide Cultural Heritage sites that can be filtered by type and that uses cluster markers to adapt the view to different zoom levels. LOD4Culture also includes a Cultural Heritage entity browser that builds comprehensive visualizations of sites, artists, and artworks. All data exchanges are facilitated through the use of a generator of REST …APIs over Linked Open Data that translates API calls into SPARQL queries across multiple sources, including Wikidata and DBpedia. Since March 2022, more than 1.7K users have employed LOD4Culture. The application has been mentioned many times in social media and has been featured in the DBpedia Newsletter, in the list of Wikidata tools for visualizing data, and in the open data applications list of datos.gob.es . Show more

Keywords: Cultural Heritage, Linked Open Data, data access, REST API, map visualizations, user interfaces

DOI: 10.3233/SW-233358

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-30, 2023

INK: Knowledge graph representation for efficient and performant rule mining

Authors: Steenwinckel, Bram | De Turck, Filip | Ongenae, Femke

Article Type: Research Article

Abstract: Semantic rule mining can be used for both deriving task-agnostic or task-specific information within a Knowledge Graph (KG). Underlying logical inferences to summarise the KG or fully interpretable binary classifiers predicting future events are common results of such a rule mining process. The current methods to perform task-agnostic or task-specific semantic rule mining operate, however, a completely different KG representation, making them less suitable to perform both tasks or incorporate each other’s optimizations. This also results in the need to master multiple techniques for both exploring and mining rules within KGs, as well losing time and resources when converting one …KG format into another. In this paper, we use INK, a KG representation based on neighbourhood nodes of interest to mine rules for improved decision support. By selecting one or two sets of nodes of interest, the rule miner created on top of the INK representation will either mine task-agnostic or task-specific rules. In both subfields, the INK miner is competitive to the currently state-of-the-art semantic rule miners on 14 different benchmark datasets within multiple domains. Show more

Keywords: Knowledge representation, semantic rule mining

DOI: 10.3233/SW-233495

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-22, 2023

The materials design ontology

Authors: Lambrix, Patrick | Armiento, Rickard | Li, Huanyu | Hartig, Olaf | Abd Nikooie Pour, Mina | Li, Ying

Article Type: Research Article

Abstract: In the materials design domain, much of the data from materials calculations is stored in different heterogeneous databases with different data and access models. Therefore, accessing and integrating data from different sources is challenging. As ontology-based access and integration alleviates these issues, in this paper we address data access and interoperability for computational materials databases by developing the Materials Design Ontology. This ontology is inspired by and guided by the OPTIMADE effort that aims to make materials databases interoperable and includes many of the data providers in computational materials science. In this paper, first, we describe the development and the …content of the Materials Design Ontology. Then, we use a topic model-based approach to propose additional candidate concepts for the ontology. Finally, we show the use of the Materials Design Ontology by a proof-of-concept implementation of a data access and integration system for materials databases based on the ontology.1 1 This paper is an extension of (In The Semantic Web – ISWC 2020 – 19th International Semantic Web Conference, Proceedings, Part II (2000 ) 212–227 Springer) with results from (In ESWC Workshop on Domain Ontologies for Research Data Management in Industry Commons of Materials and Manufacturing 2021 1–11) and currently unpublished results regarding an application using the ontology. This paper is an extension of (In The Semantic Web – ISWC 2020 – 19th International Semantic Web Conference, Proceedings, Part II (2000 ) 212–227 Springer) with results from (In ESWC Workshop on Domain Ontologies for Research Data Management in Industry Commons of Materials and Manufacturing 2021 1–11) and currently unpublished results regarding an application using the ontology. Show more

Keywords: Ontology, ontology development, data access, data integration, materials science, Materials Design Ontology

DOI: 10.3233/SW-233340

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-35, 2023

What is in your cookie box? Explaining ingredients of web cookies with knowledge graphs

Authors: Bushati, Geni | Rasmusen, Sven Carsten | Kurteva, Anelia | Vats, Anurag | Nako, Petraq | Fensel, Anna

Article Type: Research Article

Abstract: The General Data Protection Regulation (GDPR) has imposed strict requirements for data sharing, one of which is informed consent. A common way to request consent online is via cookies. However, commonly, users accept online cookies being unaware of the meaning of the given consent and the following implications. Once consent is given, the cookie “disappears”, and one forgets that consent was given in the first place. Retrieving cookies and consent logs becomes challenging, as most information is stored in the specific Internet browser’s logs. To make users aware of the data sharing implied by cookie consent and to support transparency and …traceability within systems, we present a knowledge graph (KG) based tool for personalised cookie consent information visualisation. The KG is based on the OntoCookie ontology, which models cookies in a machine-readable format and supports data interpretability across domains. Evaluation results confirm that the users’ comprehension of the data shared through cookies is vague and insufficient. Furthermore, our work has resulted in an increase of 47.5% in the users’ willingness to be cautious when viewing cookie banners before giving consent. These and other evaluation results confirm that our cookie data visualisation approach and tool help to increase users’ awareness of cookies and data sharing. Show more

Keywords: Cookies, consent, GDPR, ontology, knowledge graph, data sharing, comprehension

DOI: 10.3233/SW-233435

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-17, 2023

A semantic framework for condition monitoring in Industry 4.0 based on evolving knowledge bases

Authors: Giustozzi, Franco | Saunier, Julien | Zanni-Merk, Cecilia

Article Type: Research Article

Abstract: In Industry 4.0, factory assets and machines are equipped with sensors that collect data for effective condition monitoring. This is a difficult task since it requires the integration and processing of heterogeneous data from different sources, with different temporal resolutions and underlying meanings. Ontologies have emerged as a pertinent method to deal with data integration and to represent manufacturing knowledge in a machine-interpretable way through the construction of semantic models. Ontologies are used to structure knowledge in knowledge bases, which also contain instances and information about these data. Thus, a knowledge base provides a sort of virtual representation of the …different elements involved in a manufacturing process. Moreover, the monitoring of industrial processes depends on the dynamic context of their execution. Under these circumstances, the semantic model must provide a way to represent this evolution in order to represent in which situation(s) a resource is in during the execution of its tasks to support decision making. This paper proposes a semantic framework to address the evolution of knowledge bases for condition monitoring in Industry 4.0. To this end, firstly we propose a semantic model (the COInd4 ontology) for the manufacturing domain that represents the resources and processes that are part of a factory, with special emphasis on the context of these resources and processes. Relevant situations that combine sensor observations with domain knowledge are also represented in the model. Secondly, an approach that uses stream reasoning to detect these situations that lead to potential failures is introduced. This approach enriches data collected from sensors with contextual information using the proposed semantic model. The use of stream reasoning facilitates the integration of data from different data sources, different temporal resolutions as well as the processing of these data in real time. This allows to derive high-level situations from lower-level context and sensor information. Detecting situations can trigger actions to adapt the process behavior, and in turn, this change in behavior can lead to the generation of new contexts leading to new situations. These situations can have different levels of severity, and can be nested in different ways. Dealing with the rich relations among situations requires an efficient approach to organize them. Therefore, we propose a method to build a lattice, ordering those situations depending on the constraints they rely on. This lattice represents a road-map of all the situations that can be reached from a given one, normal or abnormal. This helps in decision support, by allowing the identification of the actions that can be taken to correct the abnormality avoiding in this way the interruption of the manufacturing processes. Finally, an industrial application scenario for the proposed approach is described. Show more

Keywords: Semantic technologies, ontology, context modeling, stream reasoning, condition monitoring, Industry 4.0

DOI: 10.3233/SW-233481

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-29, 2023

Creating occupant-centered digital twins using the Occupant Feedback Ontology implemented in a smartwatch app

Authors: Donkers, Alex | de Vries, Bauke | Yang, Dujuan

Article Type: Research Article

Abstract: Occupant feedback enables building managers to improve occupants’ health, comfort, and satisfaction. However, acquiring continuous occupant feedback and integrating this feedback with other building information is challenging. This paper presents a scalable method to acquire continuous occupant feedback and directly integrate this with other building information. Semantic web technologies were applied to solve data interoperability issues. The Occupant Feedback Ontology was developed to describe feedback semantically. Next to this, a smartwatch app – Mintal – was developed to acquire continuous feedback on indoor environmental quality. The app gathers location, medical information, and answers on short micro surveys. Mintal applied the …Occupant Feedback Ontology to directly integrate the feedback with linked building data. A case study was performed to evaluate this method. A semantic digital twin was created by integrating linked building data, sensor data, and occupant feedback. Results from SPARQL queries gave more insight into an occupant’s perceived comfort levels in the Open Flat. The case study shows how integrating feedback with building information allows for more occupant-centric decision support tools. The approach presented in this paper can be used in a wide range of use cases, both within and without the architecture, building, and construction domain. Show more

Keywords: Digital twin, Occupant Feedback Ontology, smartwatch, semantic web, linked building data

DOI: 10.3233/SW-223254

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-26, 2022

ImageSchemaNet: A framester graph for embodied commonsense knowledge

Authors: De Giorgis, Stefano | Gangemi, Aldo | Gromann, Dagmar

Article Type: Research Article

Abstract: Commonsense knowledge is a broad and challenging area of research which investigates our understanding of the world as well as human assumptions about reality. Deriving directly from the subjective perception of the external world, it is intrinsically intertwined with embodied cognition. Commonsense reasoning is linked to human sense-making, pattern recognition and knowledge framing abilities. This work presents a new resource that formalizes the cognitive theory of image schemas. Image schemas are dynamic conceptual building blocks originating from our sensorimotor interactions with the physical world, and enable our sense-making cognitive activity to assign coherence and structure to entities, events and situations …we experience everyday. ImageSchemaNet is an ontology that aligns pre-existing resources, such as FrameNet, VerbNet, WordNet and MetaNet from the Framester hub, to image schema theory. This article describes an empirical application of ImageSchemaNet, combined with semantic parsers, on the task of annotating natural language sentences with image schemas. Show more

Keywords: Image schemas, cognitive semantics, frame semantics, commonsense reasoning

DOI: 10.3233/SW-223084

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-25, 2022

Food process ontology requirements

Authors: Dooley, Damion | Weber, Magalie | Ibanescu, Liliana | Lange, Matthew | Chan, Lauren | Soldatova, Larisa | Yang, Chen | Warren, Robert | Shimizu, Cogan | McGinty, Hande K. | Hsiao, William

Article Type: Research Article

Abstract: People often value the sensual, celebratory, and health aspects of food, but behind this experience exists many other value-laden agricultural production, distribution, manufacturing, and physiological processes that support or undermine a healthy population and a sustainable future. The complexity of such processes is evident in both every-day food preparation of recipes and in industrial food manufacturing, packaging and storage, each of which depends critically on human or machine agents, chemical or organismal ingredient references, and the explicit instructions and implicit procedures held in formulations or recipes. An integrated ontology landscape does not yet exist to cover all the entities at …work in this farm to fork journey. It seems necessary to construct such a vision by reusing expert-curated fit-to-purpose ontology subdomains and their relationship, material, and more abstract organization and role entities. The challenge is to make this merger be, by analogy, one language, rather than nouns and verbs from a dozen or more dialects which cannot be used directly in statements about some aspect of the farm to fork journey without expensive translation or substantial dialect education in order to understand a particular text or domain of knowledge. This work focuses on the ontology components – object and data properties and annotations – needed to model food processes or more general process modelling within the context of the Open Biological and Biomedical Ontology Foundry and congruent ontologies. Ideally these components can be brought together in a general process ontology that can be specialized not only for the food domain but for carrying out other protocols as well. Many operations involved in food identification, preparation, transportation and storage – shaking, boiling, mixing, freezing, labeling, shipping – are actually common to activities from manufacturing and laboratory work to local or home food preparation. Show more

Keywords: Ontology, food processing, recipe, process modelling, OBO Foundry

DOI: 10.3233/SW-223096

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-32, 2022

Understanding the structure of knowledge graphs with ABSTAT profiles

Authors: Spahiu, Blerina | Palmonari, Matteo | Alva Principe, Renzo Arturo | Rula, Anisa

Article Type: Research Article

Abstract: While there has been a trend in the last decades for publishing large-scale and highly-interconnected Knowledge Graphs (KGs), their users often get overwhelmed by the task of understanding their content as a result of their size and complexity. Data profiling approaches have been proposed to summarize large KGs into concise and meaningful representations, so that they can be better explored, processed, and managed. Profiles based on schema patterns represent each triple in a KG with its schema-level counterpart, thus covering the entire KG with profiles of considerable size. In this paper, we provide empirical evidence that profiles based on schema …patterns, if explored with suitable mechanisms, can be useful to help users understand the content of big and complex KGs. ABSTAT provides concise pattern-based profiles and comes with faceted interfaces for profile exploration. Using this tool we present a user study based on query completion tasks. We demonstrate that users who look at ABSTAT profiles formulate their queries better and faster than users browsing the ontology of the KGs. The latter is a pretty strong baseline considering that many KGs do not even come with a specific ontology to be explored by the users. To the best of our knowledge, this is the first attempt to investigate the impact of profiling techniques on tasks related to knowledge graph understanding with a user study. Show more

Keywords: Data understanding, data profiling, summarization, rdf, knowledge graph

DOI: 10.3233/SW-223181

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-27, 2023

Towards a formal ontology of engineering functions, behaviours, and capabilities

Authors: Compagno, Francesco | Borgo, Stefano

Article Type: Research Article

Abstract: In both applied ontology and engineering, functionality is a well-researched topic, since it is through teleological causal reasoning that domain experts build mental models of engineering systems, giving birth to functions. These mental models are important throughout the whole lifecycle of any product, being used from the design phase up to diagnosis activities. Though a vast amount of work to model functions has already been carried out, the literature has not settled on a shared and well-defined approach due to the variety of concepts involved and the modeling tasks that functional descriptions should satisfy. The work in this paper posits …the basis and makes some crucial steps towards a rich ontological description of functions and related concepts, such as behaviour, capability, and capacity. A conceptual analysis of such notions is carried out using the top-level ontology DOLCE as a framework, and the ensuing logical theory is formally described in first-order logic and OWL, showing how ontological concepts can model major aspects of engineering products in applications. In particular, it is shown how functions can be distinguished from the implementation methods to realize them, how one can differentiate between capabilities and capacities of a product, and how these are related to engineering functions. Show more

Keywords: Ontology, function, behaviour, capability, DOLCE

DOI: 10.3233/SW-223188

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-34, 2023

Background knowledge in ontology matching: A survey

Authors: Portisch, Jan | Hladik, Michael | Paulheim, Heiko

Article Type: Research Article

Abstract: Ontology matching is an integral part for establishing semantic interoperability. One of the main challenges within the ontology matching operation is semantic heterogeneity, i.e. modeling differences between the two ontologies that are to be integrated. The semantics within most ontologies or schemas are, however, typically incomplete because they are designed within a certain context which is not explicitly modeled. Therefore, external background knowledge plays a major role in the task of (semi-) automated ontology and schema matching. In this survey, we introduce the reader to the general ontology matching problem. We review the background knowledge sources as well as …the approaches applied to make use of external knowledge. Our survey covers all ontology matching systems that have been presented within the years 2004–2021 at a well-known ontology matching competition together with systematically selected publications in the research field. We present a classification system for external background knowledge, concept linking strategies, as well as for background knowledge exploitation approaches. We provide extensive examples and classify all ontology matching systems under review in a resource/strategy matrix obtained by coalescing the two classification systems. Lastly, we outline interesting and yet underexplored research directions of applying external knowledge within the ontology matching process. Show more

Keywords: Ontology matching, schema matching, background knowledge, data integration, semantic integration, knowledge graphs, ontologies

DOI: 10.3233/SW-223085

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-55, 2022

MTab4D: Semantic annotation of tabular data with DBpedia

Authors: Nguyen, Phuc | Kertkeidkachorn, Natthawut | Ichise, Ryutaro | Takeda, Hideaki

Article Type: Research Article

Abstract: Semantic annotation of tabular data is the process of matching table elements with knowledge graphs. As a result, the table contents could be interpreted or inferred using knowledge graph concepts, enabling them to be useful in downstream applications such as data analytics and management. Nevertheless, semantic annotation tasks are challenging due to insufficient tabular data descriptions, heterogeneous schema, and vocabulary issues. This paper presents an automatic semantic annotation system for tabular data, called MTab4D, to generate annotations with DBpedia in three annotation tasks: 1) matching table cells to entities, 2) matching columns to entity types, and 3) matching pairs of …columns to properties. In particular, we propose an annotation pipeline that combines multiple matching signals from different table elements to address schema heterogeneity, data ambiguity, and noisiness. Additionally, this paper provides insightful analysis and extra resources on benchmarking semantic annotation with knowledge graphs. Experimental results on the original and adapted datasets of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab 2019) show that our system achieves an impressive performance for the three annotation tasks. MTab4D’s repository is publicly available at https://github.com/phucty/mtab4dbpedia . Show more

Keywords: Table annotation, knowledge graph, DBpedia, semantic table interpretation

DOI: 10.3233/SW-223098

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-25, 2022

Is neuro-symbolic AI meeting its promises in natural language processing? A structured review

Authors: Hamilton, Kyle | Nayak, Aparna | Božić, Bojan | Longo, Luca

Article Type: Research Article

Abstract: Advocates for Neuro-Symbolic Artificial Intelligence (NeSy) assert that combining deep learning with symbolic reasoning will lead to stronger AI than either paradigm on its own. As successful as deep learning has been, it is generally accepted that even our best deep learning systems are not very good at abstract reasoning. And since reasoning is inextricably linked to language, it makes intuitive sense that Natural Language Processing (NLP), would be a particularly well-suited candidate for NeSy. We conduct a structured review of studies implementing NeSy for NLP, with the aim of answering the question of whether NeSy is indeed meeting its …promises: reasoning, out-of-distribution generalization, interpretability, learning and reasoning from small data, and transferability to new domains. We examine the impact of knowledge representation, such as rules and semantic networks, language structure and relational structure, and whether implicit or explicit reasoning contributes to higher promise scores. We find that systems where logic is compiled into the neural network lead to the most NeSy goals being satisfied, while other factors such as knowledge representation, or type of neural architecture do not exhibit a clear correlation with goals being met. We find many discrepancies in how reasoning is defined, specifically in relation to human level reasoning, which impact decisions about model architectures and drive conclusions which are not always consistent across studies. Hence we advocate for a more methodical approach to the application of theories of human reasoning as well as the development of appropriate benchmarks, which we hope can lead to a better understanding of progress in the field. We make our data and code available on github for further analysis.1 1 https://github.com/kyleiwaniec/neuro-symbolic-ai-systematic-review https://github.com/kyleiwaniec/neuro-symbolic-ai-systematic-review Show more

Keywords: Neuro-symbolic artificial intelligence, natural language processing, deep learning, knowledge representation & reasoning, structured review

DOI: 10.3233/SW-223228

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-42, 2022

Enhancing Data Use Ontology (DUO) for health-data sharing by extending it with ODRL and DPV

Authors: Pandit, Harshvardhan J. | Esteves, Beatriz

Article Type: Research Article

Abstract: The Global Alliance for Genomics and Health is an international consortium that is developing the Data Use Ontology (DUO) as a standard providing machine-readable codes for automation in data discovery and responsible sharing of genomics data. DUO concepts, which are encoded using OWL, only contain the textual descriptions of the conditions for data use they represent, and do not specify the intended permissions, prohibitions, and obligations explicitly – which limits their usefulness. We present an exploration of how the Open Digital Rights Language (ODRL) can be used to explicitly represent the information inherent in DUO concepts to create policies that …are then used to represent conditions under which datasets are available for use, conditions in requests to use them, and to generate agreements based on a compatibility matching between the two. We also address a current limitation of DUO regarding specifying information relevant to privacy and data protection law by using the Data Privacy Vocabulary (DPV) which supports expressing legal concepts in a jurisdiction-agnostic manner as well as for specific laws like the GDPR. Our work supports the existing socio-technical governance processes involving use of DUO by providing a complementary rather than replacement approach. To support this and improve DUO, we provide a description of how our system can be deployed with a proof of concept demonstration that uses ODRL rules for all DUO concepts, and uses them to generate agreements through matching of requests to data offers. All resources described in this article are available at: https://w3id.org/duodrl/repo . Show more

Keywords: Health data, biomedical ontologies, policy, regulatory compliance, GDPR

DOI: 10.3233/SW-243583

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-26, 2024

Similarity joins and clustering for SPARQL

Authors: Ferrada, Sebastián | Bustos, Benjamin | Hogan, Aidan

Article Type: Research Article

Abstract: The SPARQL standard provides operators to retrieve exact matches on data, such as graph patterns, filters and grouping. This work proposes and evaluates two new algebraic operators for SPARQL 1.1 that return similarity-based results instead of exact results. First, a similarity join operator is presented, which brings together similar mappings from two sets of solution mappings. Second, a clustering solution modifier is introduced, which instead of grouping solution mappings according to exact values, brings them together by using similarity criteria. For both cases, a variety of algorithms are proposed and analysed, and use-case queries that showcase the relevance and usefulness …of the novel operators are presented. For similarity joins, experimental results are provided by comparing different physical operators over a set of real world queries, as well as comparing our implementation to the closest work found in the literature, DBSimJoin, a PostgreSQL extension that supports similarity joins. For clustering, synthetic queries are designed in order to measure the performance of the different algorithms implemented. Show more

Keywords: Similarity joins, clustering, SPARQL

DOI: 10.3233/SW-243540

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-32, 2024

Can you trust Wikidata?

Authors: Santos, Veronica | Schwabe, Daniel | Lifschitz, Sérgio

Article Type: Research Article

Abstract: In order to use a value retrieved from a Knowledge Graph (KG) for some computation, the user should, in principle, ensure that s/he trusts the veracity of the claim, i.e., considers the statement as a fact. Crowd-sourced KGs, or KGs constructed by integrating several different information sources of varying quality, must be used via a trust layer. The veracity of each claim in the underlying KG should be evaluated, considering what is relevant to carrying out some action that motivates the information seeking. The present work aims to assess how well Wikidata (WD) supports the trust decision process implied when …using its data. WD provides several mechanisms that can support this trust decision, and our KG Profiling, based on WD claims and schema, elaborates an analysis of how multiple points of view, controversies, and potentially incomplete or incongruent content are presented and represented. Show more

Keywords: Trust, contextual, KG Profiling

DOI: 10.3233/SW-243577

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-22, 2024

Towards counterfactual explanations for ontologies

Authors: Bellucci, Matthieu | Delestre, Nicolas | Malandain, Nicolas | Zanni-Merk, Cecilia

Article Type: Research Article

Abstract: Debugging and repairing Web Ontology Language (OWL) ontologies has been a key field of research since OWL became a W3C recommendation. One way to understand errors and fix them is through explanations. These explanations are usually extracted from the reasoner and displayed to the ontology authors as is. In the meantime, there has been a recent call in the eXplainable AI (XAI) field to use expert knowledge in the form of knowledge graphs and ontologies. In this paper, a parallel between explanations for machine learning and for ontologies is drawn. This link enables the adaptation of XAI methods to explain …ontologies and their entailments. Counterfactual explanations have been identified as a good candidate to solve the explainability problem in machine learning. The CEO (Counterfactual Explanations for Ontologies) method is thus proposed to explain inconsistent ontologies using counterfactual explanations. A preliminary user study is conducted to ensure that using XAI methods for ontologies is relevant and worth pursuing. Show more

Keywords: Counterfactual explanations, explainability, ontology, knowledge graph, artificial intelligence

DOI: 10.3233/SW-243566

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-26, 2024

Evidence of large-scale conceptual disarray in multi-level taxonomies in Wikidata

Authors: Dadalto, Atílio A. | Almeida, João Paulo A. | Fonseca, Claudenir M. | Guizzardi, Giancarlo

Article Type: Research Article

Abstract: The distinction between types and individuals is key to most conceptual modeling techniques and knowledge representation languages. Despite that, there are a number of situations in which modelers navigate this distinction inadequately, leading to problematic models. We show evidence of a large number of representation mistakes associated with the failure to employ this distinction in the Wikidata knowledge graph, which can be identified with the incorrect use of instantiation , which is a relation between an instance and a type, and specialization (or subtyping ), which is a relation between two types. The prevalence of the problems in Wikidata’s …taxonomies suggests that methodological and computational tools are required to mitigate the issues identified, which occur in many settings when individuals, types, and their metatypes are included in the domain of interest. We conduct a conceptual analysis of entities involved in recurrent erroneous cases identified in this empirical data, and present a tool that supports users in identifying some of these mistakes. Show more

Keywords: Wikidata, multi-level taxonomies, quality assessment

DOI: 10.3233/SW-243562

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-18, 2024

DIAERESIS: RDF data partitioning and query processing on SPARK

Authors: Troullinou, Georgia | Agathangelos, Giannis | Kondylakis, Haridimos | Stefanidis, Kostas | Plexousakis, Dimitris

Article Type: Research Article

Abstract: The explosion of the web and the abundance of linked data demand effective and efficient methods for storage, management, and querying. Apache Spark is one of the most widely used engines for big data processing, with more and more systems adopting it for efficient query answering. Existing approaches exploiting Spark for querying RDF data, adopt partitioning techniques for reducing the data that need to be accessed in order to improve efficiency. However, simplistic data partitioning fails, on one hand, to minimize data access and on the other hand to group data usually queried together. This is translated into limited improvement …in terms of efficiency in query answering. In this paper, we present DIAERESIS, a novel platform that accepts as input an RDF dataset and effectively partitions it, minimizing data access and improving query answering efficiency. To achieve this, DIAERESIS first identifies the top-k most important schema nodes, i.e., the most important classes, as centroids and distributes the other schema nodes to the centroid they mostly depend on. Then, it allocates the corresponding instance nodes to the schema nodes they are instantiated under. Our algorithm enables fine-tuning of data distribution, significantly reducing data access for query answering. We experimentally evaluate our approach using both synthetic and real workloads, strictly dominating existing state-of-the-art, showing that we improve query answering in several cases by orders of magnitude. Show more

Keywords: RDF, data partitioning, Spark, query answering

DOI: 10.3233/SW-243554

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-27, 2024

How to create and use a national cross-domain ontology and data infrastructure on the Semantic Web

Authors: Hyvönen, Eero

Article Type: Research Article

Abstract: This paper presents a model and lessons learned for creating a cross-domain national ontology and Linked (Open) Data (LOD) infrastructure. The idea is to extend the global, domain agnostic “layer cake model” underlying the Semantic Web with domain specific and local features needed in applications. To test and demonstrate the infrastructure, a series of LOD services and portals in use have been created in 2002–2023 that cover a wide range of application domains. They have attracted millions of users in total suggesting feasibility of the proposed model. This line of research and development is unique due to its systematic national …level nature and long time span of over twenty years. Show more

Keywords: Semantic Web, Linked Data, ontologies, web services, infrastructures, portals, Digital Humanities

DOI: 10.3233/SW-243468

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-15, 2024

The role of ontologies and knowledge in Explainable AI

Authors: Confalonieri, Roberto | Kutz, Oliver | Calvanese, Diego | Alonso-Moral, Jose Maria | Zhou, Shang-Ming

Article Type: Editorial

Keywords: Explainable AI, symbolic knowledge, applied ontology

DOI: 10.3233/SW-243529

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-4, 2024

A behaviouristic semantic approach to blockchain-based e-commerce

Authors: Bella, Giampaolo | Cantone, Domenico | Castiglione, Gianpietro | Nicolosi Asmundo, Marianna | Santamaria, Daniele Francesco

Article Type: Research Article

Abstract: Electronic commerce and finance are progressively supporting and including decentralized, shared and public ledgers such as the blockchain. This is reshaping traditional commercial activities by advancing them towards Decentralized Finance (DeFi) and Commerce 3.0, thereby supporting the latter’s potential to outpace the hurdles of central authority controllers and lawgivers. The quantity and entropy of the information that must be sought and managed to become active participants in such a relentlessly evolving scenario are increasing at a steady pace. For example, that information comprises asset or service description, general rules of the game, and specific technologies involved for decentralization. Moreover, …the relevant information ought to be shared among innumerable and heterogeneous stakeholders, such as producers, buyers, digital identity providers, valuation services, and shipment services, to just name a few. A clear semantic representation of such a complex and multifaceted blockchain-based e-Commerce ecosystem would contribute dramatically to make it more usable, namely more automatically accessible to virtually anyone wanting to play the role of a stakeholder, thereby reducing programmers’ effort. However, we feel that reaching that goal still requires substantial effort in the tailoring of Semantic Web technologies, hence this article sets out on such a route and advances a stack of OWL 2 ontologies for the semantic description of decentralized e-commerce. The stack includes a number of relevant features, ranging from the applicable stakeholders through the supply chain of the offerings for an asset, up to the Ethereum blockchain, its tokens and smart contracts. Ontologies are defined by taking a behaviouristic approach to represent the various participants as agents in terms of their actions, inspired by the Theory of Agents and the related mentalistic notions. The stack is validated through appropriate metrics and SPARQL queries implementing suitable competency questions, then demonstrated through the representation of a real world use case, namely, the iExec marketplace. Show more

Keywords: Ontology, OWL, Semantic Web, DeFi, agent, blockchain, Ethereum, e-commerce, supply chain, ONTOCHAIN, iExec

DOI: 10.3233/SW-243543

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-52, 2024

When one logic is not enough: Integrating first-order annotations in OWL ontologies

Authors: Flügel, Simon | Glauer, Martin | Neuhaus, Fabian | Hastings, Janna

Article Type: Research Article

Abstract: In ontology development, there is a gap between domain ontologies which mostly use the Web Ontology Language, OWL, and foundational ontologies written in first-order logic, FOL. To bridge this gap, we present Gavel, a tool that supports the development of heterogeneous ‘FOWL’ ontologies that extend OWL with FOL annotations, and is able to reason over the combined set of axioms. Since FOL annotations are stored in OWL annotations, FOWL ontologies remain compatible with the existing OWL infrastructure. We show that for the OWL domain ontology OBI, the stronger integration with its FOL top-level ontology BFO via our approach enables us …to detect several inconsistencies. Furthermore, existing OWL ontologies can benefit from FOL annotations. We illustrate this with FOWL ontologies containing mereotopological axioms that enable additional, useful inferences. Finally, we show that even for large domain ontologies such as ChEBI, automatic reasoning with FOL annotations can be used to detect previously unnoticed errors in the classification. Show more

Keywords: Ontology, heterogeneous ontology, first-order

DOI: 10.3233/SW-243440

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-16, 2024

TermIt: Managing normative thesauri

Authors: Křemen, Petr | Med, Michal | Blaško, Miroslav | Saeeda, Lama | Ledvinka, Martin | Buzek, Alan

Article Type: Research Article

Abstract: Thesauri are popular, as they represent a manageable compromise – they are well-understood by domain experts, yet formal enough to boost use cases like semantic search. Still, as the thesauri size and complexity grow in a domain, proper tracking of the concept references to their definitions in normative documents, interlinking concepts defined in different documents, and keeping all the concepts semantically consistent and ready for subsequent conceptual modeling, is difficult and requires adequate tool support. We present TermIt, a web-based thesauri manager aimed at supporting the creation of thesauri based on decrees, directives, standards, and other normative documents. In addition to …common editing capabilities, TermIt offers term extraction from documents, including a web document annotation browser plug-in, tracking term definitions in documents, term quality and ontological correctness checking, community discussions over term meanings, and seamless interlinking of concepts across different thesauri. We also show that TermIt features better fit the E-government scenarios in the Czech Republic than other tools. Additionally, we present the feasibility of TermIt for these scenarios by preliminary user experience evaluation. Show more

Keywords: Thesaurus, ontology, SKOS, UFO

DOI: 10.3233/SW-243547

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-11, 2024

Ontology of active and passive environmental exposure

Authors: Vámos, Csilla | Scheider, Simon | Sonnenschein, Tabea | Vermeulen, Roel

Article Type: Research Article

Abstract: Exposure is a central concept of the health and behavioural sciences needed to study the influence of the environment on the health and behaviour of people within a spatial context. While an increasing number of studies measure different forms of exposure, including the influence of air quality, noise, and crime, the influence of land cover on physical activity, or of the urban environment on food intake, we lack a common conceptual model of environmental exposure that captures its main structure across all this variety. Against the background of such a model, it becomes possible not only to systematically compare …different methodological approaches but also to better link and align the content of the vast amount of scientific publications on this topic in a systematic way. For example, an important methodical distinction is between studies that model exposure as an exclusive outcome of some activity versus ones where the environment acts as a direct independent cause (active vs. passive exposure ). Here, we propose an information ontology design pattern that can be used to define exposure and to model its variants. It is built around causal relations between concepts including persons, activities, concentrations, exposures, environments and health risks. We formally define environmental stressors and variants of exposure using Description Logic (DL), which allows automatic inference from the RDF-encoded content of a paper. Furthermore, concepts can be linked with data models and modelling methods used in a study. To test the pattern, we translated competency questions into SPARQL queries and ran them over RDF-encoded content. Results show how study characteristics can be classified and summarized in a manner that reflects important methodical differences. Show more

Keywords: Ontology, epidemiology, RDF, GIS, computer science

DOI: 10.3233/SW-243546

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-29, 2024

NeuSyRE: Neuro-symbolic visual understanding and reasoning framework based on scene graph enrichment

Authors: Khan, M. Jaleed | G. Breslin, John | Curry, Edward

Article Type: Research Article

Abstract: Exploring the potential of neuro-symbolic hybrid approaches offers promising avenues for seamless high-level understanding and reasoning about visual scenes. Scene Graph Generation (SGG) is a symbolic image representation approach based on deep neural networks (DNN) that involves predicting objects, their attributes, and pairwise visual relationships in images to create scene graphs, which are utilized in downstream visual reasoning. The crowdsourced training datasets used in SGG are highly imbalanced, which results in biased SGG results. The vast number of possible triplets makes it challenging to collect sufficient training samples for every visual concept or relationship. To address these challenges, we propose …augmenting the typical data-driven SGG approach with common sense knowledge to enhance the expressiveness and autonomy of visual understanding and reasoning. We present a loosely-coupled neuro-symbolic visual understanding and reasoning framework that employs a DNN-based pipeline for object detection and multi-modal pairwise relationship prediction for scene graph generation and leverages common sense knowledge in heterogenous knowledge graphs to enrich scene graphs for improved downstream reasoning. A comprehensive evaluation is performed on multiple standard datasets, including Visual Genome and Microsoft COCO, in which the proposed approach outperformed the state-of-the-art SGG methods in terms of relationship recall scores, i.e. Recall@K and mean Recall@K, as well as the state-of-the-art scene graph-based image captioning methods in terms of SPICE and CIDEr scores with comparable BLEU, ROGUE and METEOR scores. As a result of enrichment, the qualitative results showed improved expressiveness of scene graphs, resulting in more intuitive and meaningful caption generation using scene graphs. Our results validate the effectiveness of enriching scene graphs with common sense knowledge using heterogeneous knowledge graphs. This work provides a baseline for future research in knowledge-enhanced visual understanding and reasoning. The source code is available at https://github.com/jaleedkhan/neusire . Show more

Keywords: Scene graph, image representation, common sense knowledge, knowledge enrichment, visual reasoning, image captioning

DOI: 10.3233/SW-233510

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-25, 2023

A study of concept similarity in Wikidata

Authors: Ilievski, Filip | Shenoy, Kartik | Chalupsky, Hans | Klein, Nicholas | Szekely, Pedro

Article Type: Research Article

Abstract: Robust estimation of concept similarity is crucial for applications of AI in the commercial, biomedical, and publishing domains, among others. While the related task of word similarity has been extensively studied, resulting in a wide range of methods, estimating concept similarity between nodes in Wikidata has not been considered so far. In light of the adoption of Wikidata for increasingly complex tasks that rely on similarity, and its unique size, breadth, and crowdsourcing nature, we propose that conceptual similarity should be revisited for the case of Wikidata. In this paper, we study a wide range of representative similarity methods for …Wikidata, organized into three categories, and leverage background information for knowledge injection via retrofitting. We measure the impact of retrofitting with different weighted subsets from Wikidata and ProBase. Experiments on three benchmarks show that the best performance is achieved by pairing language models with rich information, whereas the impact of injecting knowledge is most positive on methods that originally do not consider comprehensive information. The performance of retrofitting is conditioned on the selection of high-quality similarity knowledge. A key limitation of this study, similar to prior work lies in the limited size and scope of the similarity benchmarks. While Wikidata provides an unprecedented possibility for a representative evaluation of concept similarity, effectively doing so remains a key challenge. Show more

Keywords: Similarity, Wikidata, retrofitting, knowledge graphs, embeddings

DOI: 10.3233/SW-233520

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-20, 2024

Ontology-based GraphQL server generation for data access and data integration

Authors: Li, Huanyu | Hartig, Olaf | Armiento, Rickard | Lambrix, Patrick

Article Type: Research Article

Abstract: In a GraphQL Web API, a so-called GraphQL schema defines the types of data objects that can be queried, and so-called resolver functions are responsible for fetching the relevant data from underlying data sources. Thus, we can expect to use GraphQL not only for data access but also for data integration, if the GraphQL schema reflects the semantics of data from multiple data sources, and the resolver functions can obtain data from these data sources and structure the data according to the schema. However, there does not exist a semantics-aware approach to employ GraphQL for data integration. Furthermore, there are …no formal methods for defining a GraphQL API based on an ontology. In this work, we introduce a framework for using GraphQL in which a global domain ontology informs the generation of a GraphQL server that answers requests by querying heterogeneous data sources. The core of this framework consists of an algorithm to generate a GraphQL schema based on an ontology and a generic resolver function based on semantic mappings. We provide a prototype, OBG-gen, of this framework, and we evaluate our approach over a real-world data integration scenario in the materials design domain and two synthetic benchmark scenarios (Linköping GraphQL Benchmark and GTFS-Madrid-Bench). The experimental results of our evaluation indicate that: (i) our approach is feasible to generate GraphQL servers for data access and integration over heterogeneous data sources, thus avoiding a manual construction of GraphQL servers, and (ii) our data access and integration approach is general and applicable to different domains where data is shared or queried via different ways. Show more

Keywords: Data integration, ontology, GraphQL

DOI: 10.3233/SW-233550

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-37, 2024

Context-aware composition of agent policies by Markov decision process entity embeddings and agent ensembles

Authors: Merkle, Nicole | Mikut, Ralf

Article Type: Research Article

Abstract: Computational agents support humans in many areas of life and are therefore found in heterogeneous contexts. This means that agents operate in rapidly changing environments and can be confronted with huge state and action spaces. In order to perform services and carry out activities satisfactorily, i.e. in a goal-oriented manner, agents require prior knowledge and therefore have to develop and pursue context-dependent policies. The problem here is that prescribing policies in advance is limited and inflexible, especially in dynamically changing environments. Moreover, the context (i.e. the external and internal state) of an agent determines its choice of actions. Since the …environments in which agents operate can be stochastic and complex in terms of the number of states and feasible actions, activities are usually modelled in a simplified way by Markov decision processes so that, for example, agents with reinforcement learning are able to learn policies, i.e. state-action pairs, that help to capture the context and act accordingly to optimally perform activities. However, training policies for all possible contexts using reinforcement learning is time-consuming. A requirement and challenge for agents is to learn strategies quickly and respond immediately in cross-context environments and applications, e.g., the Internet, service robotics, cyber-physical systems. In this work, we propose a novel simulation-based approach that enables a) the representation of heterogeneous contexts through knowledge graphs and entity embeddings and b) the context-aware composition of policies on demand by ensembles of agents running in parallel. The evaluation we conducted with the “Virtual Home” dataset indicates that agents with a need to switch seamlessly between different contexts, e.g. in a home environment, can request on-demand composed policies that lead to the successful completion of context-appropriate activities without having to learn these policies in lengthy training steps and episodes, in contrast to agents that use reinforcement learning. The presented approach enables both context-aware and cross-context applicability of untrained computational agents. Furthermore, the source code of the approach as well as the generated data, i.e. the trained embeddings and the semantic representation of domestic activities, is open source and openly accessible on Github and Figshare. Show more

Keywords: Knowledge graphs, word embeddings, web platform, reinforcement learning, computational agents

DOI: 10.3233/SW-233531

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-29, 2024

An ontology of 3D environment where a simulated manipulation task takes place (ENVON)

Authors: Zhao, Yingshen | Sarkar, Arkopaul | Elmhadhbi, Linda | Karray, Mohamed Hedi | Fillatreau, Philippe | Archimède, Bernard

Article Type: Research Article

Abstract: Thanks to the advent of robotics in shopfloor and warehouse environments, control rooms need to seamlessly exchange information regarding the dynamically changing 3D environment to facilitate tasks and path planning for the robots. Adding to the complexity, this type of environment is heterogeneous as it includes both free space and various types of rigid bodies (equipment, materials, humans etc.). At the same time, 3D environment-related information is also required by the virtual applications (e.g., VR techniques) for the behavioral study of CAD-based product models or simulation of CNC operations. In past research, information models for such heterogeneous 3D environments are …often built without ensuring connection among different levels of abstractions required for different applications. For addressing such multiple points of view and modelling requirements for 3D objects and environments, this paper proposes an ontology model that integrates the contextual, topologic, and geometric information of both the rigid bodies and the free space. The ontology provides an evolvable knowledge model that can support simulated task-related information in general. This ontology aims to greatly improve interoperability as a path planning system (e.g., robot) and will be able to deal with different applications by simply updating the contextual semantics related to some targeted application while keeping the geometric and topological models intact by leveraging the semantic link among the models. Show more

Keywords: Path planning, joint task and path planning, ontology, simulated task-related knowledge

DOI: 10.3233/SW-233460

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-28, 2023

Wikidata subsetting: Approaches, tools, and evaluation

Authors: Hosseini Beghaeiraveri, Seyed Amir | Labra Gayo, Jose Emilio | Waagmeester, Andra | Ammar, Ammar | Gonzalez, Carolina | Slenter, Denise | Ul-Hasan, Sabah | Willighagen, Egon | McNeill, Fiona | Gray, Alasdair J.G.

Article Type: Research Article

Abstract: Wikidata is a massive Knowledge Graph (KG), including more than 100 million data items and nearly 1.5 billion statements, covering a wide range of topics such as geography, history, scholarly articles, and life science data. The large volume of Wikidata is difficult to handle for research purposes; many researchers cannot afford the costs of hosting 100 GB of data. While Wikidata provides a public SPARQL endpoint, it can only be used for short-running queries. Often, researchers only require a limited range of data from Wikidata focusing on a particular topic for their use case. Subsetting is the process of defining and …extracting the required data range from the KG; this process has received increasing attention in recent years. Specific tools and several approaches have been developed for subsetting, which have not been evaluated yet. In this paper, we survey the available subsetting approaches, introducing their general strengths and weaknesses, and evaluate four practical tools specific for Wikidata subsetting – WDSub, KGTK, WDumper, and WDF – in terms of execution performance, extraction accuracy, and flexibility in defining the subsets. Results show that all four tools have a minimum of 99.96% accuracy in extracting defined items and 99.25% in extracting statements. The fastest tool in extraction is WDF, while the most flexible tool is WDSub. During the experiments, multiple subset use cases have been defined and the extracted subsets have been analyzed, obtaining valuable information about the variety and quality of Wikidata, which would otherwise not be possible through the public Wikidata SPARQL endpoint. Show more

Keywords: Knowledge Graph, Wikidata, Subsetting, Big Data, Accuracy, Performance

DOI: 10.3233/SW-233491

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-27, 2023

OBO Foundry food ontology interconnectivity

Authors: Dooley, Damion | Andrés-Hernández, Liliana | Bordea, Georgeta | Carmody, Leigh | Cavalieri, Duccio | Chan, Lauren | Castellano-Escuder, Pol | Lachat, Carl | Mougin, Fleur | Vitali, Francesco | Yang, Chen | Weber, Magalie | Kucuk McGinty, Hande | Lange, Matthew

Article Type: Research Article

Abstract: Since its creation in 2016, the FoodOn food ontology has become an interconnected partner in various academic and government projects that span agricultural and public health domains. This paper examines recent data interoperability capabilities arising from food-related ontologies belonging to, or compatible with, the encyclopedic Open Biological and Biomedical Ontology Foundry (OBO) ontology platform, and how research organizations and industry might utilize them for their own projects or for data exchange. Projects are seeking standardized vocabulary across many food supply activities ranging from agricultural production, harvesting, preparation, food processing, marketing, distribution and consumption, as well as more indirect health, economic, …food security and sustainability analysis and reporting tools. To satisfy this demand for controlled vocabulary requires establishing domain specific ontologies whose curators coordinate closely to produce recommended patterns for food system vocabulary. Show more

Keywords: Ontology, data harmonization, OBO Foundry, food systems, public health, epidemiology, multi-ontology framework, One Health

DOI: 10.3233/SW-233458

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-20, 2024

ConSolid: A federated ecosystem for heterogeneous multi-stakeholder projects

Authors: Werbrouck, Jeroen | Pauwels, Pieter | Beetz, Jakob | Verborgh, Ruben | Mannens, Erik

Article Type: Research Article

Abstract: In many industries, multiple parties collaborate on a larger project. At the same time, each of those stakeholders participates in multiple independent projects simultaneously. A double patchwork can thus be identified, with a many-to-many relationship between actors and collaborative projects. One key example is the construction industry, where every project is unique, involving specialists for many subdomains, ranging from the architectural design over technical installations to geospatial information, governmental regulation and sometimes even historical research. A digital representation of this process and its outcomes requires semantic interoperability between these subdomains, which however often work with heterogeneous and unstructured data. In …this paper we propose to address this double patchwork via a decentralized ecosystem for multi-stakeholder, multi-industry collaborations dealing with heterogeneous information snippets. At its core, this ecosystem, called ConSolid, builds upon the Solid specifications for Web decentralization, but extends these both on a (meta)data pattern level and on microservice level. To increase the robustness of data allocation and filtering, we identify the need to go beyond Solid’s current LDP-inspired interfaces to a Solid Pod and introduce the concept of metadata-generated ‘virtual views’, to be generated using an access-controlled SPARQL interface to a Pod. A recursive, scalable way to discover multi-vault aggregations is proposed, along with data patterns for connecting and aligning heterogeneous (RDF and non-RDF) resources across vaults in a mediatype-agnostic fashion. We demonstrate the use and benefits of the ecosystem using minimal running examples, concluding with the setup of an example use case from the Architecture, Engineering, Construction and Operations (AECO) industry. Show more

Keywords: Solid, DCAT, interdisciplinary collaboration, Common Data Environment, semantic enrichment

DOI: 10.3233/SW-233396

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-32, 2024

QALD-10 – The 10th challenge on question answering over linked data

Authors: Usbeck, Ricardo | Yan, Xi | Perevalov, Aleksandr | Jiang, Longquan | Schulz, Julius | Kraft, Angelie | Möller, Cedric | Huang, Junbo | Reineke, Jan | Ngonga Ngomo, Axel-Cyrille | Saleem, Muhammad | Both, Andreas

Article Type: Research Article

Abstract: Knowledge Graph Question Answering (KGQA) has gained attention from both industry and academia over the past decade. Researchers proposed a substantial amount of benchmarking datasets with different properties, pushing the development in this field forward. Many of these benchmarks depend on Freebase, DBpedia, or Wikidata. However, KGQA benchmarks that depend on Freebase and DBpedia are gradually less studied and used, because Freebase is defunct and DBpedia lacks the structural validity of Wikidata. Therefore, research is gravitating toward Wikidata-based benchmarks. That is, new KGQA benchmarks are created on the basis of Wikidata and existing ones are migrated. We present a new, …multilingual, complex KGQA benchmarking dataset as the 10th part of the Question Answering over Linked Data (QALD) benchmark series. This corpus formerly depended on DBpedia. Since QALD serves as a base for many machine-generated benchmarks, we increased the size and adjusted the benchmark to Wikidata and its ranking mechanism of properties. These measures foster novel KGQA developments by more demanding benchmarks. Creating a benchmark from scratch or migrating it from DBpedia to Wikidata is non-trivial due to the complexity of the Wikidata knowledge graph, mapping issues between different languages, and the ranking mechanism of properties using qualifiers. We present our creation strategy and the challenges we faced that will assist other researchers in their future work. Our case study, in the form of a conference challenge, is accompanied by an in-depth analysis of the created benchmark. Show more

Keywords: Knowledge graph question answering, benchmark, challenge, query analysis

DOI: 10.3233/SW-233471

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-15, 2023

Differential privacy and SPARQL

Authors: Buil-Aranda, Carlos | Lobo, Jorge | Olmedo, Federico

Article Type: Research Article

Abstract: Differential privacy is a framework that provides formal tools to develop algorithms to access databases and answer statistical queries with quantifiable accuracy and privacy guarantees. The notions of differential privacy are defined independently of the data model and the query language at steak. Most differential privacy results have been obtained on aggregation queries such as counting or finding maximum or average values, and on grouping queries over aggregations such as the creation of histograms. So far, the data model used by the framework research has typically been the relational model and the query language SQL. However, effective realizations of …differential privacy for SQL queries that required joins had been limited. This has imposed severe restrictions on applying differential privacy in RDF knowledge graphs and SPARQL queries. By the simple nature of RDF data, most useful queries accessing RDF graphs will require intensive use of joins. Recently, new differential privacy techniques have been developed that can be applied to many types of joins in SQL with reasonable results. This opened the question of whether these new results carry over to RDF and SPARQL. In this paper we provide a positive answer to this question by presenting an algorithm that can answer counting queries over a large class of SPARQL queries that guarantees differential privacy, if the RDF graph is accompanied with semantic information about its structure. We have implemented our algorithm and conducted several experiments, showing the feasibility of our approach for large graph databases. Our aim has been to present an approach that can be used as a stepping stone towards extensions and other realizations of differential privacy for SPARQL and RDF. Show more

Keywords: Differential privacy, SPARQL

DOI: 10.3233/SW-233474

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-29, 2023

The RDF2vec family of knowledge graph embedding methods

Authors: Portisch, Jan | Paulheim, Heiko

Article Type: Research Article

Abstract: Knowledge graph embeddings represent a group of machine learning techniques which project entities and relations of a knowledge graph to continuous vector spaces. RDF2vec is a scalable embedding approach rooted in the combination of random walks with a language model. It has been successfully used in various applications. Recently, multiple variants to the RDF2vec approach have been proposed, introducing variations both on the walk generation and on the language modeling side. The combination of those different approaches has lead to an increasing family of RDF2vec variants. In this paper, we evaluate a total of twelve RDF2vec variants on a …comprehensive set of benchmark models, and compare them to seven existing knowledge graph embedding methods from the family of link prediction approaches. Besides the established GEval benchmark introducing various downstream machine learning tasks on the DBpedia knowledge graph, we also use the new DLCC (Description Logic Class Constructors) benchmark consisting of two gold standards, one based on DBpedia, and one based on synthetically generated graphs. The latter allows for analyzing which ontological patterns in a knowledge graph can actually be learned by different embedding. With this evaluation, we observe that certain tailored RDF2vec variants can lead to improved performance on different downstream tasks, given the nature of the underlying problem, and that they, in particular, have a different behavior in modeling similarity and relatedness. The findings can be used to provide guidance in selecting a particular RDF2vec method for a given task. Show more

Keywords: RDF2vec, knowledge graph embedding, representation learning, embedding evaluation

DOI: 10.3233/SW-233514

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-32, 2024

CANARD: An approach for generating expressive correspondences based on competency questions for alignment

Authors: Thiéblin, Elodie | Sousa, Guilherme | Haemmerlé, Ollivier | Trojahn, Cássia

Article Type: Research Article

Abstract: Ontology matching aims at making ontologies interoperable. While the field has fully developed in the last years, most approaches are still limited to the generation of simple correspondences. More expressiveness is, however, required to better address the different kinds of ontology heterogeneities. This paper presents CANARD (C omplex A lignment N eed and A -box based R elation D iscovery), an approach for generating expressive correspondences that rely on the notion of competency questions for alignment (CQA). A CQA expresses the user knowledge needs in terms of alignment and aims at reducing the alignment space. The approach takes as input …a set of CQAs as SPARQL queries over the source ontology. The generation of correspondences is performed by matching the subgraph from the source CQA to the similar surroundings of the instances from the target ontology. Evaluation is carried out on both synthetic and real-world datasets. The impact of several approach parameters is discussed. Experiments have showed that CANARD performs, overall, better on CQA coverage than precision and that using existing same:As links, between the instances of the source and target ontologies, gives better results than exact label matches of their labels. The use of CQA improved also both CQA coverage and precision with respect to using automatically generated queries. The reassessment of the counter-example increased significantly the precision, to the detriment of runtime. Finally, experiments on large datasets showed that CANARD is one of the few systems that can perform on large knowledge bases, but depends on regularly populated knowledge bases and the quality of instance links. Show more

Keywords: Ontology matching, complex alignment, competency question for alignment, user needs

DOI: 10.3233/SW-233521

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-33, 2024

d2kg: An integrated ontology for knowledge graph-based representation of government decisions and acts

Authors: Serderidis, Konstantinos | Konstantinidis, Ioannis | Meditskos, Georgios | Peristeras, Vassilios | Bassiliades, Nick

Article Type: Research Article

Abstract: To implement Open Governance a crucial element is the efficient use of the big amounts of open data produced in the public domain. Public administration is a rich source of data and potentially new knowledge. It is a data intensive sector producing vast amounts of information encoded in government decisions and acts, published nowadays on the World Wide Web. The knowledge shared on the Web is mostly made available via semi-structured documents written in natural language. To exploit this knowledge, technologies such as Natural Language Processing, Information Extraction, Data mining and the Semantic Web could be used, embedding into documents …explicit semantics based on formal knowledge representations such as ontologies. Knowledge representation can be made possible by the deployment of Knowledge Graphs, collections of interlinked representations of entities, events or concepts, based on underlying ontologies. This can assist data analysts to achieve a higher level of situational awareness, facilitating automated reasoning towards different objectives, such as for knowledge management, data maintenance, transparency and cybersecurity. This paper presents a new ontology d2kg [d(iavgeia) 2(to) k(nowledge) g(raph)] integrating in a unique way standard EU ontologies, core and controlled vocabularies to enable exploitation of publicly available data from government decisions and acts published on the Greek platform Diavgeia with the aim to facilitate data sharing, re-usability and interoperability. It demonstrates a characteristic example of a Knowledge Graph based representation of government decisions and acts, highlighting its added value to respond to real practical use cases for the promotion of transparency, accountability and public awareness. The developed d2kg ontology in owl is accessible at: http://w3id.org/d2kg , as well as documented at: http://w3id.org/d2kg/documentation . Show more

Keywords: Semantic Web, Linked Open Data, ontologies, Knowledge Graphs, government decisions and acts, Diavgeia

DOI: 10.3233/SW-243535

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-23, 2024

Dura-Europos Stories: Developing interactive storytelling applications using knowledge graphs for cultural heritage exploration

Authors: Thornton, Katherine | Seals-Nutt, Kenneth | Chen, Anne

Article Type: Research Article

Abstract: We introduce Dura-Europos Stories, a multimedia application for viewing artifacts and places related to the Dura-Europos archaeological excavation. We describe the process of mapping data to the Wikidata data model as well as the process of contributing data to Wikidata. We provide an overview of the functionality of an interactive application for viewing images of the artifacts in the context of their metadata. We contextualize this project as an example of using knowledge graphs in research projects in order to leverage technologies of the Semantic Web in such a way that data related to the project can be easily combined …with other data on the web. Presenting artifacts in this story-based application allows users to explore these objects visually, and provides pathways for further exploration of related information. Show more

Keywords: Wikidata, art history, archaeology, cultural heritage, digital humanities

DOI: 10.3233/SW-243552

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-15, 2024

InterpretME: A tool for interpretations of machine learning models over knowledge graphs

Authors: Chudasama, Yashrajsinh | Purohit, Disha | Rohde, Philipp D. | Gercke, Julian | Vidal, Maria-Esther

Article Type: Research Article

Abstract: In recent years, knowledge graphs (KGs) have been considered pyramids of interconnected data enriched with semantics for complex decision-making. The potential of KGs and the demand for interpretability of machine learning (ML) models in diverse domains (e.g., healthcare) have gained more attention. The lack of model transparency negatively impacts the understanding and, in consequence, interpretability of the predictions made by a model. Data-driven models should be empowered with the knowledge required to trace down their decisions and the transformations made to the input data to increase model transparency. In this paper, we propose InterpretME, a tool that using KGs, provides …fine-grained representations of trained ML models. An ML model description includes data – (e.g., features’ definition and SHACL validation) and model-based characteristics (e.g., relevant features and interpretations of prediction probabilities and model decisions). InterpretME allows for defining a model’s features over data collected in various formats, e.g., RDF KGs, CSV, and JSON. InterpretME relies on the SHACL schema to validate integrity constraints over the input data. InterpretME traces the steps of data collection, curation, integration, and prediction; it documents the collected metadata in the InterpretME KG. InterpretME is published in GitHub1 1 https://github.com/SDM-TIB/InterpretME and Zenodo2 2 https://doi.org/10.5281/zenodo.8112628 . The InterpretME framework includes a pipeline for enhancing the interpretability of ML models, the InterpretME KG, and an ontology to describe the main characteristics of trained ML models; a PyPI library of InterpretME is also provided3 3 https://pypi.org/project/InterpretME/ . Additionally, a live code4 4 https://github.com/SDM-TIB/InterpretME_Demo , and a video5 5 https://www.youtube.com/watch?v=Bu4lROnY4xg demonstrating InterpretME in several use cases are also available. https://github.com/SDM-TIB/InterpretME https://doi.org/10.5281/zenodo.8112628 https://pypi.org/project/InterpretME/ https://github.com/SDM-TIB/InterpretME_Demo https://www.youtube.com/watch?v=Bu4lROnY4xg Show more

Keywords: Interpretability, knowledge graphs, machine learning models, shacl, ontologies

DOI: 10.3233/SW-233511

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-21, 2024

Empowering the SDM-RDFizer tool for scaling up to complex knowledge graph creation pipelines ¹

Authors: Iglesias, Enrique | Vidal, Maria-Esther | Collarana, Diego | Chaves-Fraga, David

Article Type: Research Article

Abstract: The significant increase in data volume in recent years has prompted the adoption of knowledge graphs as valuable data structures for integrating diverse data and metadata. However, this surge in data availability has brought to light challenges related to standardization, interoperability, and data quality. Knowledge graph creation faces complexities from large data volumes, data heterogeneity, and high duplicate rates. This work addresses these challenges and proposes data management techniques to scale up the creation of knowledge graphs specified using the RDF Mapping Language (RML). These techniques are integrated into SDM-RDFizer, transforming it into a two-fold solution designed to address the …complexities of generating knowledge graphs. Firstly, we introduce a reordering approach for RML triples maps, prioritizing the evaluation of the most selective maps first to reduce memory usage. Secondly, we employ an RDF compression strategy, along with optimized data structures and novel operators, to prevent the generation of duplicate RDF triples and optimize the execution of RML operators. We assess the performance of SDM-RDFizer through established benchmarks. The evaluation showcases the effectiveness of SDM-RDFizer compared to state-of-the-art RML engines, emphasizing the benefits of our techniques. Furthermore, the paper presents real-world projects where SDM-RDFizer has been utilized, providing insights into the advantages of declaratively defining knowledge graphs and efficiently executing these specifications using this engine. Show more

Keywords: Data integration systems, knowledge graphs, RDF mapping languages

DOI: 10.3233/SW-243580

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-28, 2024

PAPAYA: A library for performance analysis of SQL-based RDF processing systems

Authors: Ragab, Mohamed | Adidarma, Adam Satria | Tommasini, Riccardo

Article Type: Research Article

Abstract: Prescriptive Performance Analysis (PPA) has shown to be more useful than traditional descriptive and diagnostic analyses for making sense of Big Data (BD) frameworks’ performance. In practice, when processing large (RDF) graphs on top of relational BD systems, several design decisions emerge and cannot be decided automatically, e.g., the choice of the schema, the partitioning technique, and the storage formats. PPA, and in particular ranking functions, helps enable actionable insights on performance data, leading practitioners to an easier choice of the best way to deploy BD frameworks, especially for graph processing. However, the amount of experimental work …required to implement PPA is still huge. In this paper, we present PAPAYA,1 1 https://github.com/DataSystemsGroupUT/PAPyA a library for implementing PPA that allows (1) preparing RDF graphs data for a processing pipeline over relational BD systems, (2) enables automatic ranking of the performance in a user-defined solution space of experimental dimensions; (3) allows user-defined flexible extensions in terms of systems to test and ranking methods. We showcase PAPAYA on a set of experiments based on the SparkSQL framework. PAPAYA simplifies the performance analytics of BD systems for processing large (RDF) graphs. We provide PAPAYA as a public open-source library under an MIT license that will be a catalyst for designing new research prescriptive analytical techniques for BD applications. https://github.com/DataSystemsGroupUT/PAPyA Show more

Keywords: Benchmarking, RDF systems, big data, analytics, apache spark

DOI: 10.3233/SW-243582

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-19, 2024

Declarative generation of RDF-star graphs from heterogeneous data

Authors: Arenas-Guerrero, Julián | Iglesias-Molina, Ana | Chaves-Fraga, David | Garijo, Daniel | Corcho, Oscar | Dimou, Anastasia

Article Type: Research Article

Abstract: RDF-star has been proposed as an extension of RDF to make statements about statements. Libraries and graph stores have started adopting RDF-star, but the generation of RDF-star data remains largely unexplored. To allow generating RDF-star from heterogeneous data, RML-star was proposed as an extension of RML. However, no system has been developed so far that implements the RML-star specification. In this work, we present Morph-KGCstar , which extends the Morph-KGC materialization engine to generate RDF-star datasets. We validate Morph-KGCstar by running test cases derived from the N-Triples-star syntax tests and we apply it to two real-world use …cases from the biomedical and open science domains. We compare the performance of our approach against other RDF-star generation methods (SPARQL-Anything), showing that Morph-KGCstar scales better for large input datasets, but it is slower when processing multiple smaller files. Show more

Keywords: Knowledge graphs, RDF-star, RML-star, data integration

DOI: 10.3233/SW-243602

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-19, 2024

smart-KG: Partition-Based Linked Data Fragments for querying knowledge graphs

Authors: Azzam, Amr | Polleres, Axel | D. Fernández, Javier | Acosta, Maribel

Article Type: Research Article

Abstract: RDF and SPARQL provide a uniform way to publish and query billions of triples in open knowledge graphs (KGs) on the Web. Yet, provisioning of a fast, reliable, and responsive live querying solution for open KGs is still hardly possible through SPARQL endpoints alone: while such endpoints provide a remarkable performance for single queries, they typically can not cope with highly concurrent query workloads by multiple clients. To mitigate this, the Linked Data Fragments (LDF) framework sparked the design of different alternative low-cost interfaces such as Triple Pattern Fragments (TPF), that partially offload the query processing workload to the client …side. On the downside, such interfaces still come with the expense of unnecessarily high network load due to the necessary transfer of intermediate results to the client, leading to query performance degradation compared with endpoints. To address this problem, in the present work, we investigate alternative interfaces, refining and extending the original TPF idea, which also aims at reducing server-resource consumption, by shipping query-relevant partitions of KGs from the server to the client. To this end, first, we align formal definitions and notations of the original LDF framework to uniformly present existing LDF implements and such “partition-based” LDF approaches. These novel LDF interfaces retrieve, instead of the exact triples matching a particular query pattern, a subset of pre-materialized, compressed, partitions of the original graph, containing all answers to a query pattern, to be further evaluated on the client side. As a concrete representative of partition-based LDF, we present smart-KG + , extending and refining our prior work (In WWW ’20: The Web Conference 2020 (2020 ) 984–994 ACM / IW3C2) in several respects. Our proposed approach is a step forward towards a better-balanced share of the query processing load between clients and servers by shipping graph partitions driven by the structure of RDF graphs to group entities described with the same sets of properties and classes, resulting in significant data transfer reduction. Our experiments demonstrate that the smart-KG + significantly outperforms existing Web SPARQL interfaces on both pre-existing benchmarks for highly concurrent query execution as well as an accustomed query workload inspired by query logs of existing SPARQL endpoints. Show more

Keywords: Knowledge graph, SPARQL, Linked Data Fragments, graph partitioning, availability

DOI: 10.3233/SW-243571

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-45, 2024

Empirical ontology design patterns and shapes from Wikidata

Authors: Carriero, Valentina Anita | Groth, Paul | Presutti, Valentina

Article Type: Research Article

Abstract: The ontology underlying the Wikidata knowledge graph (KG) has not been formalized. Instead, its semantics emerges bottom-up from the use of its classes and properties. Flexible guidelines and rules have been defined by the Wikidata project for the use of its ontology, however, it is still often difficult to reuse the ontology’s constructs. Based on the assumption that identifying ontology design patterns from a knowledge graph contributes to making its (possibly) implicit ontology emerge, in this paper we present a method for extracting what we term empirical ontology design patterns (EODPs) from a knowledge graph. This method takes as …input a knowledge graph and extracts EODPs as sets of axioms/constraints involving the classes instantiated in the KG. These EODPs include data about the probability of such axioms/constraints happening . We apply our method on two domain-specific portions of Wikidata, addressing the music and art, architecture, and archaeology domains, and we compare the empirical ontology design patterns we extract with the current support present in Wikidata. We show how these patterns can provide guidance for the use of the Wikidata ontology and its potential improvement, and can give insight into the content of (domain-specific portions of) the Wikidata knowledge graph. Show more

Keywords: Ontology design patterns, shapes, knowledge graphs, Wikidata, empirical knowledge engineering

DOI: 10.3233/SW-243613

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-25, 2024

Special Issue on Semantic Web for Industrial Engineering: Research and Applications

Authors: Aameri, Bahar | Poveda-Villalón, María | Sanfilippo, Emilio M. | Terkaj, Walter

Article Type: Editorial

DOI: 10.3233/SW-243623

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-7, 2024

Multilinguality and LLOD: A survey across linguistic description levels

Authors: Gromann, Dagmar | Apostol, Elena-Simona | Chiarcos, Christian | Cremaschi, Marco | Gracia, Jorge | Gkirtzou, Katerina | Liebeskind, Chaya | Mockiene, Liudmila | Rosner, Michael | Schuurman, Ineke | Sérasset, Gilles | Silvano, Purificação | Spahiu, Blerina | Truică, Ciprian-Octavian | Utka, Andrius | Valunaite Oleskeviciene, Giedre

Article Type: Research Article

Abstract: Limited accessibility to language resources and technologies represents a challenge for the analysis, preservation, and documentation of natural languages other than English. Linguistic Linked (Open) Data (LLOD) holds the promise to ease the creation, linking, and reuse of multilingual linguistic data across distributed and heterogeneous resources. However, individual language resources and technologies accommodate or target different linguistic description levels, e.g., morphology, syntax, phonology, and pragmatics. In this comprehensive survey, the state-of-the-art of multilinguality and LLOD is being represented with a particular focus on linguistic description levels, identifying open challenges and gaps as well as proposing an ideal ecosystem for multilingual …LLOD across description levels. This survey seeks to contribute an introductory text for newcomers to the field of multilingual LLOD, uncover gaps and challenges to be tackled by the LLOD community in reference to linguistic description levels, and present a solid basis for a future best practice of multilingual LLOD across description levels. Show more

Keywords: Multilinguality, linguistic linked data, linguistic description levels, systematic survey

DOI: 10.3233/SW-243591

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-44, 2024

Ontology of autonomous driving based on the SAE J3016 standard

Authors: Trypuz, Robert | Kulicki, Piotr | Sopek, Mirek

Article Type: Research Article

Abstract: Autonomous driving is a recently developed area in which technology seems to be ahead of its understanding within society. That causes some fears concerning the reliability of autonomous vehicles and controversies over liability in case of accidents. Specifying levels of driving autonomy within the SAE-J3016 standard is widely recognized as a significant step towards comprehending the essence of the achievements. However, the standard provides even more valuable insights into the process of driving automation. In the paper, we develop the ideas using the methods of formal ontology that allow us to make the conceptual system more precise and formalize it. …To increase inseparability, we ground our system on a top-level BFO ontology. We present a formal account of several areas covered by the SAE-J3016 standard, including motor vehicles and their systems, driving tasks and subtasks, roles of persons in road communication, and autonomy levels. Show more

Keywords: Autonomous driving, autonomous vehicle, self-driving car, BFO

DOI: 10.3233/SW-243578

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-26, 2024

Analysis of ontologies and policy languages to represent information flows in GDPR

Authors: Esteves, Beatriz | Rodríguez-Doncel, Víctor

Article Type: Research Article

Abstract: This article surveys existing vocabularies, ontologies and policy languages that can be used to represent informational items referenced in GDPR rights and obligations, such as the ‘notification of a data breach’, the ‘controller’s identity’ or a ‘DPIA’. Rights and obligations in GDPR are analyzed in terms of information flows between different stakeholders, and a complete collection of 57 different informational items that are mentioned by GDPR is described. 13 privacy-related policy languages and 9 data protection vocabularies and ontologies are studied in relation to this list of informational items. ODRL and LegalRuleML emerge as the languages that can respond positively …to a greater number of the defined comparison criteria if complemented with DPV and GDPRtEXT, since 39 out of the 57 informational items can be modelled. Online supplementary material is provided, including a simple search application and a taxonomy of the identified entities. Show more

Keywords: Privacy policy languages, data protection ontologies, GDPR, rights, obligations

DOI: 10.3233/SW-223009

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-35, 2022

Semantic-enabled architecture for auditable privacy-preserving data analysis

Authors: Ekaputra, Fajar J. | Ekelhart, Andreas | Mayer, Rudolf | Miksa, Tomasz | Šarčević, Tanja | Tsepelakis, Sotirios | Waltersdorfer, Laura

Article Type: Research Article

Abstract: Small and medium-sized organisations face challenges in acquiring, storing and analysing personal data, particularly sensitive data (e.g., data of medical nature), due to data protection regulations, such as the GDPR in the EU, which stipulates high standards in data protection. Consequently, these organisations often refrain from collecting data centrally, which means losing the potential of data analytics and learning from aggregated user data. To enable organisations to leverage the full-potential of the collected personal data, two main technical challenges need to be addressed: (i) organisations must preserve the privacy of individual users and honour their consent, while (ii) being …able to provide data and algorithmic governance, e.g., in the form of audit trails, to increase trust in the result and support reproducibility of the data analysis tasks performed on the collected data. Such an auditable, privacy-preserving data analysis is currently challenging to achieve, as existing methods and tools only offer partial solutions to this problem, e.g., data representation of audit trails and user consent, automatic checking of usage policies or data anonymisation. To the best of our knowledge, there exists no approach providing an integrated architecture for auditable, privacy-preserving data analysis. To address these gaps, as the main contribution of this paper, we propose the WellFort approach, a semantic-enabled architecture for auditable, privacy-preserving data analysis which provides secure storage for users’ sensitive data with explicit consent, and delivers a trusted, auditable analysis environment for executing data analytic processes in a privacy-preserving manner. Additional contributions include the adaptation of Semantic Web technologies as an integral part of the WellFort architecture, and the demonstration of the approach through a feasibility study with a prototype supporting use cases from the medical domain. Our evaluation shows that WellFort enables privacy preserving analysis of data, and collects sufficient information in an automated way to support its auditability at the same time. Show more

Keywords: Provenance, semantic web, privacy-preserving data analysis, auditability, dpv, consent management

DOI: 10.3233/SW-212883

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-34, 2021

Consent through the lens of semantics: State of the art survey and best practices

Authors: Kurteva, Anelia | Chhetri, Tek Raj | Pandit, Harshvardhan J. | Fensel, Anna

Article Type: Research Article

Abstract: The acceptance of the GDPR legislation in 2018 started a new technological shift towards achieving transparency. GDPR put focus on the concept of informed consent applicable for data processing, which led to an increase of the responsibilities regarding data sharing for both end users and companies. This paper presents a literature survey of existing solutions that use semantic technology for implementing consent. The main focus is on ontologies, how they are used for consent representation and for consent management in combination with other technologies such as blockchain. We also focus on visualisation solutions aimed at improving individuals’ consent comprehension. Finally, …based on the overviewed state of the art we propose best practices for consent implementation. Show more

Keywords: Consent, GDPR, semantic web technology, ontology

DOI: 10.3233/SW-210438

Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-27, 2021

Go to header Go to navigation Go to search Go to contents Go to footer

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]