Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Purchase individual online access for 1 year to this journal.
Price: EUR 170.00Impact Factor 2023: 3
The journal Semantic Web – Interoperability, Usability, Applicability is an international and interdisciplinary journal bringing together researchers from various fields which share the vision and need for more effective and meaningful ways to share information across agents and services on the future Internet and elsewhere.
As such, Semantic Web technologies shall support the seamless integration of data, on-the-fly composition and interoperation of Web services, as well as more intuitive search engines. The semantics – or meaning – of information, however, cannot be defined without a context, which makes personalization, trust and provenance core topics for Semantic Web research.
New retrieval paradigms, user interfaces and visualization techniques have to unleash the power of the Semantic Web and at the same time hide its complexity from the user. Based on this vision, the journal welcomes contributions ranging from theoretical and foundational research over methods and tools to descriptions of concrete ontologies and applications in all areas. Papers which add a social, spatial and temporal dimension to Semantic Web research, as well as application-oriented papers making use of formal semantics, are especially welcome.
The journal is co-published by the Akademische Verlagsgesellschaft AKA.
Authors: De Giorgis, Stefano | Gangemi, Aldo | Gromann, Dagmar
Article Type: Research Article
Abstract: Commonsense knowledge is a broad and challenging area of research which investigates our understanding of the world as well as human assumptions about reality. Deriving directly from the subjective perception of the external world, it is intrinsically intertwined with embodied cognition. Commonsense reasoning is linked to human sense-making, pattern recognition and knowledge framing abilities. This work presents a new resource that formalizes the cognitive theory of image schemas. Image schemas are dynamic conceptual building blocks originating from our sensorimotor interactions with the physical world, and enable our sense-making cognitive activity to assign coherence and structure to entities, events and situations …we experience everyday. ImageSchemaNet is an ontology that aligns pre-existing resources, such as FrameNet, VerbNet, WordNet and MetaNet from the Framester hub, to image schema theory. This article describes an empirical application of ImageSchemaNet, combined with semantic parsers, on the task of annotating natural language sentences with image schemas. Show more
Keywords: Image schemas, cognitive semantics, frame semantics, commonsense reasoning
DOI: 10.3233/SW-223084
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-25, 2022
Authors: Dooley, Damion | Weber, Magalie | Ibanescu, Liliana | Lange, Matthew | Chan, Lauren | Soldatova, Larisa | Yang, Chen | Warren, Robert | Shimizu, Cogan | McGinty, Hande K. | Hsiao, William
Article Type: Research Article
Abstract: People often value the sensual, celebratory, and health aspects of food, but behind this experience exists many other value-laden agricultural production, distribution, manufacturing, and physiological processes that support or undermine a healthy population and a sustainable future. The complexity of such processes is evident in both every-day food preparation of recipes and in industrial food manufacturing, packaging and storage, each of which depends critically on human or machine agents, chemical or organismal ingredient references, and the explicit instructions and implicit procedures held in formulations or recipes. An integrated ontology landscape does not yet exist to cover all the entities at …work in this farm to fork journey. It seems necessary to construct such a vision by reusing expert-curated fit-to-purpose ontology subdomains and their relationship, material, and more abstract organization and role entities. The challenge is to make this merger be, by analogy, one language, rather than nouns and verbs from a dozen or more dialects which cannot be used directly in statements about some aspect of the farm to fork journey without expensive translation or substantial dialect education in order to understand a particular text or domain of knowledge. This work focuses on the ontology components – object and data properties and annotations – needed to model food processes or more general process modelling within the context of the Open Biological and Biomedical Ontology Foundry and congruent ontologies. Ideally these components can be brought together in a general process ontology that can be specialized not only for the food domain but for carrying out other protocols as well. Many operations involved in food identification, preparation, transportation and storage – shaking, boiling, mixing, freezing, labeling, shipping – are actually common to activities from manufacturing and laboratory work to local or home food preparation. Show more
Keywords: Ontology, food processing, recipe, process modelling, OBO Foundry
DOI: 10.3233/SW-223096
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-32, 2022
Authors: Spahiu, Blerina | Palmonari, Matteo | Alva Principe, Renzo Arturo | Rula, Anisa
Article Type: Research Article
Abstract: While there has been a trend in the last decades for publishing large-scale and highly-interconnected Knowledge Graphs (KGs), their users often get overwhelmed by the task of understanding their content as a result of their size and complexity. Data profiling approaches have been proposed to summarize large KGs into concise and meaningful representations, so that they can be better explored, processed, and managed. Profiles based on schema patterns represent each triple in a KG with its schema-level counterpart, thus covering the entire KG with profiles of considerable size. In this paper, we provide empirical evidence that profiles based on schema …patterns, if explored with suitable mechanisms, can be useful to help users understand the content of big and complex KGs. ABSTAT provides concise pattern-based profiles and comes with faceted interfaces for profile exploration. Using this tool we present a user study based on query completion tasks. We demonstrate that users who look at ABSTAT profiles formulate their queries better and faster than users browsing the ontology of the KGs. The latter is a pretty strong baseline considering that many KGs do not even come with a specific ontology to be explored by the users. To the best of our knowledge, this is the first attempt to investigate the impact of profiling techniques on tasks related to knowledge graph understanding with a user study. Show more
Keywords: Data understanding, data profiling, summarization, rdf, knowledge graph
DOI: 10.3233/SW-223181
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-27, 2023
Authors: Compagno, Francesco | Borgo, Stefano
Article Type: Research Article
Abstract: In both applied ontology and engineering, functionality is a well-researched topic, since it is through teleological causal reasoning that domain experts build mental models of engineering systems, giving birth to functions. These mental models are important throughout the whole lifecycle of any product, being used from the design phase up to diagnosis activities. Though a vast amount of work to model functions has already been carried out, the literature has not settled on a shared and well-defined approach due to the variety of concepts involved and the modeling tasks that functional descriptions should satisfy. The work in this paper posits …the basis and makes some crucial steps towards a rich ontological description of functions and related concepts, such as behaviour, capability, and capacity. A conceptual analysis of such notions is carried out using the top-level ontology DOLCE as a framework, and the ensuing logical theory is formally described in first-order logic and OWL, showing how ontological concepts can model major aspects of engineering products in applications. In particular, it is shown how functions can be distinguished from the implementation methods to realize them, how one can differentiate between capabilities and capacities of a product, and how these are related to engineering functions. Show more
Keywords: Ontology, function, behaviour, capability, DOLCE
DOI: 10.3233/SW-223188
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-34, 2023
Authors: Portisch, Jan | Hladik, Michael | Paulheim, Heiko
Article Type: Research Article
Abstract: Ontology matching is an integral part for establishing semantic interoperability. One of the main challenges within the ontology matching operation is semantic heterogeneity, i.e. modeling differences between the two ontologies that are to be integrated. The semantics within most ontologies or schemas are, however, typically incomplete because they are designed within a certain context which is not explicitly modeled. Therefore, external background knowledge plays a major role in the task of (semi-) automated ontology and schema matching. In this survey, we introduce the reader to the general ontology matching problem. We review the background knowledge sources as well as …the approaches applied to make use of external knowledge. Our survey covers all ontology matching systems that have been presented within the years 2004–2021 at a well-known ontology matching competition together with systematically selected publications in the research field. We present a classification system for external background knowledge, concept linking strategies, as well as for background knowledge exploitation approaches. We provide extensive examples and classify all ontology matching systems under review in a resource/strategy matrix obtained by coalescing the two classification systems. Lastly, we outline interesting and yet underexplored research directions of applying external knowledge within the ontology matching process. Show more
Keywords: Ontology matching, schema matching, background knowledge, data integration, semantic integration, knowledge graphs, ontologies
DOI: 10.3233/SW-223085
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-55, 2022
Authors: Nguyen, Phuc | Kertkeidkachorn, Natthawut | Ichise, Ryutaro | Takeda, Hideaki
Article Type: Research Article
Abstract: Semantic annotation of tabular data is the process of matching table elements with knowledge graphs. As a result, the table contents could be interpreted or inferred using knowledge graph concepts, enabling them to be useful in downstream applications such as data analytics and management. Nevertheless, semantic annotation tasks are challenging due to insufficient tabular data descriptions, heterogeneous schema, and vocabulary issues. This paper presents an automatic semantic annotation system for tabular data, called MTab4D, to generate annotations with DBpedia in three annotation tasks: 1) matching table cells to entities, 2) matching columns to entity types, and 3) matching pairs of …columns to properties. In particular, we propose an annotation pipeline that combines multiple matching signals from different table elements to address schema heterogeneity, data ambiguity, and noisiness. Additionally, this paper provides insightful analysis and extra resources on benchmarking semantic annotation with knowledge graphs. Experimental results on the original and adapted datasets of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab 2019) show that our system achieves an impressive performance for the three annotation tasks. MTab4D’s repository is publicly available at https://github.com/phucty/mtab4dbpedia . Show more
Keywords: Table annotation, knowledge graph, DBpedia, semantic table interpretation
DOI: 10.3233/SW-223098
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-25, 2022
Authors: Hamilton, Kyle | Nayak, Aparna | Božić, Bojan | Longo, Luca
Article Type: Research Article
Abstract: Advocates for Neuro-Symbolic Artificial Intelligence (NeSy) assert that combining deep learning with symbolic reasoning will lead to stronger AI than either paradigm on its own. As successful as deep learning has been, it is generally accepted that even our best deep learning systems are not very good at abstract reasoning. And since reasoning is inextricably linked to language, it makes intuitive sense that Natural Language Processing (NLP), would be a particularly well-suited candidate for NeSy. We conduct a structured review of studies implementing NeSy for NLP, with the aim of answering the question of whether NeSy is indeed meeting its …promises: reasoning, out-of-distribution generalization, interpretability, learning and reasoning from small data, and transferability to new domains. We examine the impact of knowledge representation, such as rules and semantic networks, language structure and relational structure, and whether implicit or explicit reasoning contributes to higher promise scores. We find that systems where logic is compiled into the neural network lead to the most NeSy goals being satisfied, while other factors such as knowledge representation, or type of neural architecture do not exhibit a clear correlation with goals being met. We find many discrepancies in how reasoning is defined, specifically in relation to human level reasoning, which impact decisions about model architectures and drive conclusions which are not always consistent across studies. Hence we advocate for a more methodical approach to the application of theories of human reasoning as well as the development of appropriate benchmarks, which we hope can lead to a better understanding of progress in the field. We make our data and code available on github for further analysis.1 1 https://github.com/kyleiwaniec/neuro-symbolic-ai-systematic-review https://github.com/kyleiwaniec/neuro-symbolic-ai-systematic-review Show more
Keywords: Neuro-symbolic artificial intelligence, natural language processing, deep learning, knowledge representation & reasoning, structured review
DOI: 10.3233/SW-223228
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-42, 2022
Authors: Pandit, Harshvardhan J. | Esteves, Beatriz
Article Type: Research Article
Abstract: The Global Alliance for Genomics and Health is an international consortium that is developing the Data Use Ontology (DUO) as a standard providing machine-readable codes for automation in data discovery and responsible sharing of genomics data. DUO concepts, which are encoded using OWL, only contain the textual descriptions of the conditions for data use they represent, and do not specify the intended permissions, prohibitions, and obligations explicitly – which limits their usefulness. We present an exploration of how the Open Digital Rights Language (ODRL) can be used to explicitly represent the information inherent in DUO concepts to create policies that …are then used to represent conditions under which datasets are available for use, conditions in requests to use them, and to generate agreements based on a compatibility matching between the two. We also address a current limitation of DUO regarding specifying information relevant to privacy and data protection law by using the Data Privacy Vocabulary (DPV) which supports expressing legal concepts in a jurisdiction-agnostic manner as well as for specific laws like the GDPR. Our work supports the existing socio-technical governance processes involving use of DUO by providing a complementary rather than replacement approach. To support this and improve DUO, we provide a description of how our system can be deployed with a proof of concept demonstration that uses ODRL rules for all DUO concepts, and uses them to generate agreements through matching of requests to data offers. All resources described in this article are available at: https://w3id.org/duodrl/repo . Show more
Keywords: Health data, biomedical ontologies, policy, regulatory compliance, GDPR
DOI: 10.3233/SW-243583
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-26, 2024
Authors: Ferrada, Sebastián | Bustos, Benjamin | Hogan, Aidan
Article Type: Research Article
Abstract: The SPARQL standard provides operators to retrieve exact matches on data, such as graph patterns, filters and grouping. This work proposes and evaluates two new algebraic operators for SPARQL 1.1 that return similarity-based results instead of exact results. First, a similarity join operator is presented, which brings together similar mappings from two sets of solution mappings. Second, a clustering solution modifier is introduced, which instead of grouping solution mappings according to exact values, brings them together by using similarity criteria. For both cases, a variety of algorithms are proposed and analysed, and use-case queries that showcase the relevance and usefulness …of the novel operators are presented. For similarity joins, experimental results are provided by comparing different physical operators over a set of real world queries, as well as comparing our implementation to the closest work found in the literature, DBSimJoin, a PostgreSQL extension that supports similarity joins. For clustering, synthetic queries are designed in order to measure the performance of the different algorithms implemented. Show more
Keywords: Similarity joins, clustering, SPARQL
DOI: 10.3233/SW-243540
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-32, 2024
Authors: Santos, Veronica | Schwabe, Daniel | Lifschitz, Sérgio
Article Type: Research Article
Abstract: In order to use a value retrieved from a Knowledge Graph (KG) for some computation, the user should, in principle, ensure that s/he trusts the veracity of the claim, i.e., considers the statement as a fact. Crowd-sourced KGs, or KGs constructed by integrating several different information sources of varying quality, must be used via a trust layer. The veracity of each claim in the underlying KG should be evaluated, considering what is relevant to carrying out some action that motivates the information seeking. The present work aims to assess how well Wikidata (WD) supports the trust decision process implied when …using its data. WD provides several mechanisms that can support this trust decision, and our KG Profiling, based on WD claims and schema, elaborates an analysis of how multiple points of view, controversies, and potentially incomplete or incongruent content are presented and represented. Show more
Keywords: Trust, contextual, KG Profiling
DOI: 10.3233/SW-243577
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-22, 2024
Authors: Bellucci, Matthieu | Delestre, Nicolas | Malandain, Nicolas | Zanni-Merk, Cecilia
Article Type: Research Article
Abstract: Debugging and repairing Web Ontology Language (OWL) ontologies has been a key field of research since OWL became a W3C recommendation. One way to understand errors and fix them is through explanations. These explanations are usually extracted from the reasoner and displayed to the ontology authors as is. In the meantime, there has been a recent call in the eXplainable AI (XAI) field to use expert knowledge in the form of knowledge graphs and ontologies. In this paper, a parallel between explanations for machine learning and for ontologies is drawn. This link enables the adaptation of XAI methods to explain …ontologies and their entailments. Counterfactual explanations have been identified as a good candidate to solve the explainability problem in machine learning. The CEO (Counterfactual Explanations for Ontologies) method is thus proposed to explain inconsistent ontologies using counterfactual explanations. A preliminary user study is conducted to ensure that using XAI methods for ontologies is relevant and worth pursuing. Show more
Keywords: Counterfactual explanations, explainability, ontology, knowledge graph, artificial intelligence
DOI: 10.3233/SW-243566
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-26, 2024
Authors: Dadalto, Atílio A. | Almeida, João Paulo A. | Fonseca, Claudenir M. | Guizzardi, Giancarlo
Article Type: Research Article
Abstract: The distinction between types and individuals is key to most conceptual modeling techniques and knowledge representation languages. Despite that, there are a number of situations in which modelers navigate this distinction inadequately, leading to problematic models. We show evidence of a large number of representation mistakes associated with the failure to employ this distinction in the Wikidata knowledge graph, which can be identified with the incorrect use of instantiation , which is a relation between an instance and a type, and specialization (or subtyping ), which is a relation between two types. The prevalence of the problems in Wikidata’s …taxonomies suggests that methodological and computational tools are required to mitigate the issues identified, which occur in many settings when individuals, types, and their metatypes are included in the domain of interest. We conduct a conceptual analysis of entities involved in recurrent erroneous cases identified in this empirical data, and present a tool that supports users in identifying some of these mistakes. Show more
Keywords: Wikidata, multi-level taxonomies, quality assessment
DOI: 10.3233/SW-243562
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-18, 2024
Authors: Troullinou, Georgia | Agathangelos, Giannis | Kondylakis, Haridimos | Stefanidis, Kostas | Plexousakis, Dimitris
Article Type: Research Article
Abstract: The explosion of the web and the abundance of linked data demand effective and efficient methods for storage, management, and querying. Apache Spark is one of the most widely used engines for big data processing, with more and more systems adopting it for efficient query answering. Existing approaches exploiting Spark for querying RDF data, adopt partitioning techniques for reducing the data that need to be accessed in order to improve efficiency. However, simplistic data partitioning fails, on one hand, to minimize data access and on the other hand to group data usually queried together. This is translated into limited improvement …in terms of efficiency in query answering. In this paper, we present DIAERESIS, a novel platform that accepts as input an RDF dataset and effectively partitions it, minimizing data access and improving query answering efficiency. To achieve this, DIAERESIS first identifies the top-k most important schema nodes, i.e., the most important classes, as centroids and distributes the other schema nodes to the centroid they mostly depend on. Then, it allocates the corresponding instance nodes to the schema nodes they are instantiated under. Our algorithm enables fine-tuning of data distribution, significantly reducing data access for query answering. We experimentally evaluate our approach using both synthetic and real workloads, strictly dominating existing state-of-the-art, showing that we improve query answering in several cases by orders of magnitude. Show more
Keywords: RDF, data partitioning, Spark, query answering
DOI: 10.3233/SW-243554
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-27, 2024
Authors: Hyvönen, Eero
Article Type: Research Article
Abstract: This paper presents a model and lessons learned for creating a cross-domain national ontology and Linked (Open) Data (LOD) infrastructure. The idea is to extend the global, domain agnostic “layer cake model” underlying the Semantic Web with domain specific and local features needed in applications. To test and demonstrate the infrastructure, a series of LOD services and portals in use have been created in 2002–2023 that cover a wide range of application domains. They have attracted millions of users in total suggesting feasibility of the proposed model. This line of research and development is unique due to its systematic national …level nature and long time span of over twenty years. Show more
Keywords: Semantic Web, Linked Data, ontologies, web services, infrastructures, portals, Digital Humanities
DOI: 10.3233/SW-243468
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-15, 2024
Authors: Confalonieri, Roberto | Kutz, Oliver | Calvanese, Diego | Alonso-Moral, Jose Maria | Zhou, Shang-Ming
Article Type: Editorial
Keywords: Explainable AI, symbolic knowledge, applied ontology
DOI: 10.3233/SW-243529
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-4, 2024
Authors: Bella, Giampaolo | Cantone, Domenico | Castiglione, Gianpietro | Nicolosi Asmundo, Marianna | Santamaria, Daniele Francesco
Article Type: Research Article
Abstract: Electronic commerce and finance are progressively supporting and including decentralized, shared and public ledgers such as the blockchain. This is reshaping traditional commercial activities by advancing them towards Decentralized Finance (DeFi) and Commerce 3.0, thereby supporting the latter’s potential to outpace the hurdles of central authority controllers and lawgivers. The quantity and entropy of the information that must be sought and managed to become active participants in such a relentlessly evolving scenario are increasing at a steady pace. For example, that information comprises asset or service description, general rules of the game, and specific technologies involved for decentralization. Moreover, …the relevant information ought to be shared among innumerable and heterogeneous stakeholders, such as producers, buyers, digital identity providers, valuation services, and shipment services, to just name a few. A clear semantic representation of such a complex and multifaceted blockchain-based e-Commerce ecosystem would contribute dramatically to make it more usable, namely more automatically accessible to virtually anyone wanting to play the role of a stakeholder, thereby reducing programmers’ effort. However, we feel that reaching that goal still requires substantial effort in the tailoring of Semantic Web technologies, hence this article sets out on such a route and advances a stack of OWL 2 ontologies for the semantic description of decentralized e-commerce. The stack includes a number of relevant features, ranging from the applicable stakeholders through the supply chain of the offerings for an asset, up to the Ethereum blockchain, its tokens and smart contracts. Ontologies are defined by taking a behaviouristic approach to represent the various participants as agents in terms of their actions, inspired by the Theory of Agents and the related mentalistic notions. The stack is validated through appropriate metrics and SPARQL queries implementing suitable competency questions, then demonstrated through the representation of a real world use case, namely, the iExec marketplace. Show more
Keywords: Ontology, OWL, Semantic Web, DeFi, agent, blockchain, Ethereum, e-commerce, supply chain, ONTOCHAIN, iExec
DOI: 10.3233/SW-243543
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-52, 2024
Authors: Flügel, Simon | Glauer, Martin | Neuhaus, Fabian | Hastings, Janna
Article Type: Research Article
Abstract: In ontology development, there is a gap between domain ontologies which mostly use the Web Ontology Language, OWL, and foundational ontologies written in first-order logic, FOL. To bridge this gap, we present Gavel, a tool that supports the development of heterogeneous ‘FOWL’ ontologies that extend OWL with FOL annotations, and is able to reason over the combined set of axioms. Since FOL annotations are stored in OWL annotations, FOWL ontologies remain compatible with the existing OWL infrastructure. We show that for the OWL domain ontology OBI, the stronger integration with its FOL top-level ontology BFO via our approach enables us …to detect several inconsistencies. Furthermore, existing OWL ontologies can benefit from FOL annotations. We illustrate this with FOWL ontologies containing mereotopological axioms that enable additional, useful inferences. Finally, we show that even for large domain ontologies such as ChEBI, automatic reasoning with FOL annotations can be used to detect previously unnoticed errors in the classification. Show more
Keywords: Ontology, heterogeneous ontology, first-order
DOI: 10.3233/SW-243440
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-16, 2024
Authors: Křemen, Petr | Med, Michal | Blaško, Miroslav | Saeeda, Lama | Ledvinka, Martin | Buzek, Alan
Article Type: Research Article
Abstract: Thesauri are popular, as they represent a manageable compromise – they are well-understood by domain experts, yet formal enough to boost use cases like semantic search. Still, as the thesauri size and complexity grow in a domain, proper tracking of the concept references to their definitions in normative documents, interlinking concepts defined in different documents, and keeping all the concepts semantically consistent and ready for subsequent conceptual modeling, is difficult and requires adequate tool support. We present TermIt, a web-based thesauri manager aimed at supporting the creation of thesauri based on decrees, directives, standards, and other normative documents. In addition to …common editing capabilities, TermIt offers term extraction from documents, including a web document annotation browser plug-in, tracking term definitions in documents, term quality and ontological correctness checking, community discussions over term meanings, and seamless interlinking of concepts across different thesauri. We also show that TermIt features better fit the E-government scenarios in the Czech Republic than other tools. Additionally, we present the feasibility of TermIt for these scenarios by preliminary user experience evaluation. Show more
Keywords: Thesaurus, ontology, SKOS, UFO
DOI: 10.3233/SW-243547
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-11, 2024
Authors: Vámos, Csilla | Scheider, Simon | Sonnenschein, Tabea | Vermeulen, Roel
Article Type: Research Article
Abstract: Exposure is a central concept of the health and behavioural sciences needed to study the influence of the environment on the health and behaviour of people within a spatial context. While an increasing number of studies measure different forms of exposure, including the influence of air quality, noise, and crime, the influence of land cover on physical activity, or of the urban environment on food intake, we lack a common conceptual model of environmental exposure that captures its main structure across all this variety. Against the background of such a model, it becomes possible not only to systematically compare …different methodological approaches but also to better link and align the content of the vast amount of scientific publications on this topic in a systematic way. For example, an important methodical distinction is between studies that model exposure as an exclusive outcome of some activity versus ones where the environment acts as a direct independent cause (active vs. passive exposure ). Here, we propose an information ontology design pattern that can be used to define exposure and to model its variants. It is built around causal relations between concepts including persons, activities, concentrations, exposures, environments and health risks. We formally define environmental stressors and variants of exposure using Description Logic (DL), which allows automatic inference from the RDF-encoded content of a paper. Furthermore, concepts can be linked with data models and modelling methods used in a study. To test the pattern, we translated competency questions into SPARQL queries and ran them over RDF-encoded content. Results show how study characteristics can be classified and summarized in a manner that reflects important methodical differences. Show more
Keywords: Ontology, epidemiology, RDF, GIS, computer science
DOI: 10.3233/SW-243546
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-29, 2024
Authors: Khan, M. Jaleed | G. Breslin, John | Curry, Edward
Article Type: Research Article
Abstract: Exploring the potential of neuro-symbolic hybrid approaches offers promising avenues for seamless high-level understanding and reasoning about visual scenes. Scene Graph Generation (SGG) is a symbolic image representation approach based on deep neural networks (DNN) that involves predicting objects, their attributes, and pairwise visual relationships in images to create scene graphs, which are utilized in downstream visual reasoning. The crowdsourced training datasets used in SGG are highly imbalanced, which results in biased SGG results. The vast number of possible triplets makes it challenging to collect sufficient training samples for every visual concept or relationship. To address these challenges, we propose …augmenting the typical data-driven SGG approach with common sense knowledge to enhance the expressiveness and autonomy of visual understanding and reasoning. We present a loosely-coupled neuro-symbolic visual understanding and reasoning framework that employs a DNN-based pipeline for object detection and multi-modal pairwise relationship prediction for scene graph generation and leverages common sense knowledge in heterogenous knowledge graphs to enrich scene graphs for improved downstream reasoning. A comprehensive evaluation is performed on multiple standard datasets, including Visual Genome and Microsoft COCO, in which the proposed approach outperformed the state-of-the-art SGG methods in terms of relationship recall scores, i.e. Recall@K and mean Recall@K, as well as the state-of-the-art scene graph-based image captioning methods in terms of SPICE and CIDEr scores with comparable BLEU, ROGUE and METEOR scores. As a result of enrichment, the qualitative results showed improved expressiveness of scene graphs, resulting in more intuitive and meaningful caption generation using scene graphs. Our results validate the effectiveness of enriching scene graphs with common sense knowledge using heterogeneous knowledge graphs. This work provides a baseline for future research in knowledge-enhanced visual understanding and reasoning. The source code is available at https://github.com/jaleedkhan/neusire . Show more
Keywords: Scene graph, image representation, common sense knowledge, knowledge enrichment, visual reasoning, image captioning
DOI: 10.3233/SW-233510
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-25, 2023
Authors: Ilievski, Filip | Shenoy, Kartik | Chalupsky, Hans | Klein, Nicholas | Szekely, Pedro
Article Type: Research Article
Abstract: Robust estimation of concept similarity is crucial for applications of AI in the commercial, biomedical, and publishing domains, among others. While the related task of word similarity has been extensively studied, resulting in a wide range of methods, estimating concept similarity between nodes in Wikidata has not been considered so far. In light of the adoption of Wikidata for increasingly complex tasks that rely on similarity, and its unique size, breadth, and crowdsourcing nature, we propose that conceptual similarity should be revisited for the case of Wikidata. In this paper, we study a wide range of representative similarity methods for …Wikidata, organized into three categories, and leverage background information for knowledge injection via retrofitting. We measure the impact of retrofitting with different weighted subsets from Wikidata and ProBase. Experiments on three benchmarks show that the best performance is achieved by pairing language models with rich information, whereas the impact of injecting knowledge is most positive on methods that originally do not consider comprehensive information. The performance of retrofitting is conditioned on the selection of high-quality similarity knowledge. A key limitation of this study, similar to prior work lies in the limited size and scope of the similarity benchmarks. While Wikidata provides an unprecedented possibility for a representative evaluation of concept similarity, effectively doing so remains a key challenge. Show more
Keywords: Similarity, Wikidata, retrofitting, knowledge graphs, embeddings
DOI: 10.3233/SW-233520
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-20, 2024
Authors: Li, Huanyu | Hartig, Olaf | Armiento, Rickard | Lambrix, Patrick
Article Type: Research Article
Abstract: In a GraphQL Web API, a so-called GraphQL schema defines the types of data objects that can be queried, and so-called resolver functions are responsible for fetching the relevant data from underlying data sources. Thus, we can expect to use GraphQL not only for data access but also for data integration, if the GraphQL schema reflects the semantics of data from multiple data sources, and the resolver functions can obtain data from these data sources and structure the data according to the schema. However, there does not exist a semantics-aware approach to employ GraphQL for data integration. Furthermore, there are …no formal methods for defining a GraphQL API based on an ontology. In this work, we introduce a framework for using GraphQL in which a global domain ontology informs the generation of a GraphQL server that answers requests by querying heterogeneous data sources. The core of this framework consists of an algorithm to generate a GraphQL schema based on an ontology and a generic resolver function based on semantic mappings. We provide a prototype, OBG-gen, of this framework, and we evaluate our approach over a real-world data integration scenario in the materials design domain and two synthetic benchmark scenarios (Linköping GraphQL Benchmark and GTFS-Madrid-Bench). The experimental results of our evaluation indicate that: (i) our approach is feasible to generate GraphQL servers for data access and integration over heterogeneous data sources, thus avoiding a manual construction of GraphQL servers, and (ii) our data access and integration approach is general and applicable to different domains where data is shared or queried via different ways. Show more
Keywords: Data integration, ontology, GraphQL
DOI: 10.3233/SW-233550
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-37, 2024
Authors: Merkle, Nicole | Mikut, Ralf
Article Type: Research Article
Abstract: Computational agents support humans in many areas of life and are therefore found in heterogeneous contexts. This means that agents operate in rapidly changing environments and can be confronted with huge state and action spaces. In order to perform services and carry out activities satisfactorily, i.e. in a goal-oriented manner, agents require prior knowledge and therefore have to develop and pursue context-dependent policies. The problem here is that prescribing policies in advance is limited and inflexible, especially in dynamically changing environments. Moreover, the context (i.e. the external and internal state) of an agent determines its choice of actions. Since the …environments in which agents operate can be stochastic and complex in terms of the number of states and feasible actions, activities are usually modelled in a simplified way by Markov decision processes so that, for example, agents with reinforcement learning are able to learn policies, i.e. state-action pairs, that help to capture the context and act accordingly to optimally perform activities. However, training policies for all possible contexts using reinforcement learning is time-consuming. A requirement and challenge for agents is to learn strategies quickly and respond immediately in cross-context environments and applications, e.g., the Internet, service robotics, cyber-physical systems. In this work, we propose a novel simulation-based approach that enables a) the representation of heterogeneous contexts through knowledge graphs and entity embeddings and b) the context-aware composition of policies on demand by ensembles of agents running in parallel. The evaluation we conducted with the “Virtual Home” dataset indicates that agents with a need to switch seamlessly between different contexts, e.g. in a home environment, can request on-demand composed policies that lead to the successful completion of context-appropriate activities without having to learn these policies in lengthy training steps and episodes, in contrast to agents that use reinforcement learning. The presented approach enables both context-aware and cross-context applicability of untrained computational agents. Furthermore, the source code of the approach as well as the generated data, i.e. the trained embeddings and the semantic representation of domestic activities, is open source and openly accessible on Github and Figshare. Show more
Keywords: Knowledge graphs, word embeddings, web platform, reinforcement learning, computational agents
DOI: 10.3233/SW-233531
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-29, 2024
Authors: Zhao, Yingshen | Sarkar, Arkopaul | Elmhadhbi, Linda | Karray, Mohamed Hedi | Fillatreau, Philippe | Archimède, Bernard
Article Type: Research Article
Abstract: Thanks to the advent of robotics in shopfloor and warehouse environments, control rooms need to seamlessly exchange information regarding the dynamically changing 3D environment to facilitate tasks and path planning for the robots. Adding to the complexity, this type of environment is heterogeneous as it includes both free space and various types of rigid bodies (equipment, materials, humans etc.). At the same time, 3D environment-related information is also required by the virtual applications (e.g., VR techniques) for the behavioral study of CAD-based product models or simulation of CNC operations. In past research, information models for such heterogeneous 3D environments are …often built without ensuring connection among different levels of abstractions required for different applications. For addressing such multiple points of view and modelling requirements for 3D objects and environments, this paper proposes an ontology model that integrates the contextual, topologic, and geometric information of both the rigid bodies and the free space. The ontology provides an evolvable knowledge model that can support simulated task-related information in general. This ontology aims to greatly improve interoperability as a path planning system (e.g., robot) and will be able to deal with different applications by simply updating the contextual semantics related to some targeted application while keeping the geometric and topological models intact by leveraging the semantic link among the models. Show more
Keywords: Path planning, joint task and path planning, ontology, simulated task-related knowledge
DOI: 10.3233/SW-233460
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-28, 2023
Authors: Hosseini Beghaeiraveri, Seyed Amir | Labra Gayo, Jose Emilio | Waagmeester, Andra | Ammar, Ammar | Gonzalez, Carolina | Slenter, Denise | Ul-Hasan, Sabah | Willighagen, Egon | McNeill, Fiona | Gray, Alasdair J.G.
Article Type: Research Article
Abstract: Wikidata is a massive Knowledge Graph (KG), including more than 100 million data items and nearly 1.5 billion statements, covering a wide range of topics such as geography, history, scholarly articles, and life science data. The large volume of Wikidata is difficult to handle for research purposes; many researchers cannot afford the costs of hosting 100 GB of data. While Wikidata provides a public SPARQL endpoint, it can only be used for short-running queries. Often, researchers only require a limited range of data from Wikidata focusing on a particular topic for their use case. Subsetting is the process of defining and …extracting the required data range from the KG; this process has received increasing attention in recent years. Specific tools and several approaches have been developed for subsetting, which have not been evaluated yet. In this paper, we survey the available subsetting approaches, introducing their general strengths and weaknesses, and evaluate four practical tools specific for Wikidata subsetting – WDSub, KGTK, WDumper, and WDF – in terms of execution performance, extraction accuracy, and flexibility in defining the subsets. Results show that all four tools have a minimum of 99.96% accuracy in extracting defined items and 99.25% in extracting statements. The fastest tool in extraction is WDF, while the most flexible tool is WDSub. During the experiments, multiple subset use cases have been defined and the extracted subsets have been analyzed, obtaining valuable information about the variety and quality of Wikidata, which would otherwise not be possible through the public Wikidata SPARQL endpoint. Show more
Keywords: Knowledge Graph, Wikidata, Subsetting, Big Data, Accuracy, Performance
DOI: 10.3233/SW-233491
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-27, 2023
Authors: Dooley, Damion | Andrés-Hernández, Liliana | Bordea, Georgeta | Carmody, Leigh | Cavalieri, Duccio | Chan, Lauren | Castellano-Escuder, Pol | Lachat, Carl | Mougin, Fleur | Vitali, Francesco | Yang, Chen | Weber, Magalie | Kucuk McGinty, Hande | Lange, Matthew
Article Type: Research Article
Abstract: Since its creation in 2016, the FoodOn food ontology has become an interconnected partner in various academic and government projects that span agricultural and public health domains. This paper examines recent data interoperability capabilities arising from food-related ontologies belonging to, or compatible with, the encyclopedic Open Biological and Biomedical Ontology Foundry (OBO) ontology platform, and how research organizations and industry might utilize them for their own projects or for data exchange. Projects are seeking standardized vocabulary across many food supply activities ranging from agricultural production, harvesting, preparation, food processing, marketing, distribution and consumption, as well as more indirect health, economic, …food security and sustainability analysis and reporting tools. To satisfy this demand for controlled vocabulary requires establishing domain specific ontologies whose curators coordinate closely to produce recommended patterns for food system vocabulary. Show more
Keywords: Ontology, data harmonization, OBO Foundry, food systems, public health, epidemiology, multi-ontology framework, One Health
DOI: 10.3233/SW-233458
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-20, 2024
Authors: Werbrouck, Jeroen | Pauwels, Pieter | Beetz, Jakob | Verborgh, Ruben | Mannens, Erik
Article Type: Research Article
Abstract: In many industries, multiple parties collaborate on a larger project. At the same time, each of those stakeholders participates in multiple independent projects simultaneously. A double patchwork can thus be identified, with a many-to-many relationship between actors and collaborative projects. One key example is the construction industry, where every project is unique, involving specialists for many subdomains, ranging from the architectural design over technical installations to geospatial information, governmental regulation and sometimes even historical research. A digital representation of this process and its outcomes requires semantic interoperability between these subdomains, which however often work with heterogeneous and unstructured data. In …this paper we propose to address this double patchwork via a decentralized ecosystem for multi-stakeholder, multi-industry collaborations dealing with heterogeneous information snippets. At its core, this ecosystem, called ConSolid, builds upon the Solid specifications for Web decentralization, but extends these both on a (meta)data pattern level and on microservice level. To increase the robustness of data allocation and filtering, we identify the need to go beyond Solid’s current LDP-inspired interfaces to a Solid Pod and introduce the concept of metadata-generated ‘virtual views’, to be generated using an access-controlled SPARQL interface to a Pod. A recursive, scalable way to discover multi-vault aggregations is proposed, along with data patterns for connecting and aligning heterogeneous (RDF and non-RDF) resources across vaults in a mediatype-agnostic fashion. We demonstrate the use and benefits of the ecosystem using minimal running examples, concluding with the setup of an example use case from the Architecture, Engineering, Construction and Operations (AECO) industry. Show more
Keywords: Solid, DCAT, interdisciplinary collaboration, Common Data Environment, semantic enrichment
DOI: 10.3233/SW-233396
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-32, 2024
Authors: Usbeck, Ricardo | Yan, Xi | Perevalov, Aleksandr | Jiang, Longquan | Schulz, Julius | Kraft, Angelie | Möller, Cedric | Huang, Junbo | Reineke, Jan | Ngonga Ngomo, Axel-Cyrille | Saleem, Muhammad | Both, Andreas
Article Type: Research Article
Abstract: Knowledge Graph Question Answering (KGQA) has gained attention from both industry and academia over the past decade. Researchers proposed a substantial amount of benchmarking datasets with different properties, pushing the development in this field forward. Many of these benchmarks depend on Freebase, DBpedia, or Wikidata. However, KGQA benchmarks that depend on Freebase and DBpedia are gradually less studied and used, because Freebase is defunct and DBpedia lacks the structural validity of Wikidata. Therefore, research is gravitating toward Wikidata-based benchmarks. That is, new KGQA benchmarks are created on the basis of Wikidata and existing ones are migrated. We present a new, …multilingual, complex KGQA benchmarking dataset as the 10th part of the Question Answering over Linked Data (QALD) benchmark series. This corpus formerly depended on DBpedia. Since QALD serves as a base for many machine-generated benchmarks, we increased the size and adjusted the benchmark to Wikidata and its ranking mechanism of properties. These measures foster novel KGQA developments by more demanding benchmarks. Creating a benchmark from scratch or migrating it from DBpedia to Wikidata is non-trivial due to the complexity of the Wikidata knowledge graph, mapping issues between different languages, and the ranking mechanism of properties using qualifiers. We present our creation strategy and the challenges we faced that will assist other researchers in their future work. Our case study, in the form of a conference challenge, is accompanied by an in-depth analysis of the created benchmark. Show more
Keywords: Knowledge graph question answering, benchmark, challenge, query analysis
DOI: 10.3233/SW-233471
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-15, 2023
Authors: Buil-Aranda, Carlos | Lobo, Jorge | Olmedo, Federico
Article Type: Research Article
Abstract: Differential privacy is a framework that provides formal tools to develop algorithms to access databases and answer statistical queries with quantifiable accuracy and privacy guarantees. The notions of differential privacy are defined independently of the data model and the query language at steak. Most differential privacy results have been obtained on aggregation queries such as counting or finding maximum or average values, and on grouping queries over aggregations such as the creation of histograms. So far, the data model used by the framework research has typically been the relational model and the query language SQL. However, effective realizations of …differential privacy for SQL queries that required joins had been limited. This has imposed severe restrictions on applying differential privacy in RDF knowledge graphs and SPARQL queries. By the simple nature of RDF data, most useful queries accessing RDF graphs will require intensive use of joins. Recently, new differential privacy techniques have been developed that can be applied to many types of joins in SQL with reasonable results. This opened the question of whether these new results carry over to RDF and SPARQL. In this paper we provide a positive answer to this question by presenting an algorithm that can answer counting queries over a large class of SPARQL queries that guarantees differential privacy, if the RDF graph is accompanied with semantic information about its structure. We have implemented our algorithm and conducted several experiments, showing the feasibility of our approach for large graph databases. Our aim has been to present an approach that can be used as a stepping stone towards extensions and other realizations of differential privacy for SPARQL and RDF. Show more
Keywords: Differential privacy, SPARQL
DOI: 10.3233/SW-233474
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-29, 2023
Authors: Portisch, Jan | Paulheim, Heiko
Article Type: Research Article
Abstract: Knowledge graph embeddings represent a group of machine learning techniques which project entities and relations of a knowledge graph to continuous vector spaces. RDF2vec is a scalable embedding approach rooted in the combination of random walks with a language model. It has been successfully used in various applications. Recently, multiple variants to the RDF2vec approach have been proposed, introducing variations both on the walk generation and on the language modeling side. The combination of those different approaches has lead to an increasing family of RDF2vec variants. In this paper, we evaluate a total of twelve RDF2vec variants on a …comprehensive set of benchmark models, and compare them to seven existing knowledge graph embedding methods from the family of link prediction approaches. Besides the established GEval benchmark introducing various downstream machine learning tasks on the DBpedia knowledge graph, we also use the new DLCC (Description Logic Class Constructors) benchmark consisting of two gold standards, one based on DBpedia, and one based on synthetically generated graphs. The latter allows for analyzing which ontological patterns in a knowledge graph can actually be learned by different embedding. With this evaluation, we observe that certain tailored RDF2vec variants can lead to improved performance on different downstream tasks, given the nature of the underlying problem, and that they, in particular, have a different behavior in modeling similarity and relatedness. The findings can be used to provide guidance in selecting a particular RDF2vec method for a given task. Show more
Keywords: RDF2vec, knowledge graph embedding, representation learning, embedding evaluation
DOI: 10.3233/SW-233514
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-32, 2024
Authors: Thiéblin, Elodie | Sousa, Guilherme | Haemmerlé, Ollivier | Trojahn, Cássia
Article Type: Research Article
Abstract: Ontology matching aims at making ontologies interoperable. While the field has fully developed in the last years, most approaches are still limited to the generation of simple correspondences. More expressiveness is, however, required to better address the different kinds of ontology heterogeneities. This paper presents CANARD (C omplex A lignment N eed and A -box based R elation D iscovery), an approach for generating expressive correspondences that rely on the notion of competency questions for alignment (CQA). A CQA expresses the user knowledge needs in terms of alignment and aims at reducing the alignment space. The approach takes as input …a set of CQAs as SPARQL queries over the source ontology. The generation of correspondences is performed by matching the subgraph from the source CQA to the similar surroundings of the instances from the target ontology. Evaluation is carried out on both synthetic and real-world datasets. The impact of several approach parameters is discussed. Experiments have showed that CANARD performs, overall, better on CQA coverage than precision and that using existing same:As links, between the instances of the source and target ontologies, gives better results than exact label matches of their labels. The use of CQA improved also both CQA coverage and precision with respect to using automatically generated queries. The reassessment of the counter-example increased significantly the precision, to the detriment of runtime. Finally, experiments on large datasets showed that CANARD is one of the few systems that can perform on large knowledge bases, but depends on regularly populated knowledge bases and the quality of instance links. Show more
Keywords: Ontology matching, complex alignment, competency question for alignment, user needs
DOI: 10.3233/SW-233521
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-33, 2024
Authors: Serderidis, Konstantinos | Konstantinidis, Ioannis | Meditskos, Georgios | Peristeras, Vassilios | Bassiliades, Nick
Article Type: Research Article
Abstract: To implement Open Governance a crucial element is the efficient use of the big amounts of open data produced in the public domain. Public administration is a rich source of data and potentially new knowledge. It is a data intensive sector producing vast amounts of information encoded in government decisions and acts, published nowadays on the World Wide Web. The knowledge shared on the Web is mostly made available via semi-structured documents written in natural language. To exploit this knowledge, technologies such as Natural Language Processing, Information Extraction, Data mining and the Semantic Web could be used, embedding into documents …explicit semantics based on formal knowledge representations such as ontologies. Knowledge representation can be made possible by the deployment of Knowledge Graphs, collections of interlinked representations of entities, events or concepts, based on underlying ontologies. This can assist data analysts to achieve a higher level of situational awareness, facilitating automated reasoning towards different objectives, such as for knowledge management, data maintenance, transparency and cybersecurity. This paper presents a new ontology d2kg [d(iavgeia) 2(to) k(nowledge) g(raph)] integrating in a unique way standard EU ontologies, core and controlled vocabularies to enable exploitation of publicly available data from government decisions and acts published on the Greek platform Diavgeia with the aim to facilitate data sharing, re-usability and interoperability. It demonstrates a characteristic example of a Knowledge Graph based representation of government decisions and acts, highlighting its added value to respond to real practical use cases for the promotion of transparency, accountability and public awareness. The developed d2kg ontology in owl is accessible at: http://w3id.org/d2kg , as well as documented at: http://w3id.org/d2kg/documentation . Show more
Keywords: Semantic Web, Linked Open Data, ontologies, Knowledge Graphs, government decisions and acts, Diavgeia
DOI: 10.3233/SW-243535
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-23, 2024
Authors: Thornton, Katherine | Seals-Nutt, Kenneth | Chen, Anne
Article Type: Research Article
Abstract: We introduce Dura-Europos Stories, a multimedia application for viewing artifacts and places related to the Dura-Europos archaeological excavation. We describe the process of mapping data to the Wikidata data model as well as the process of contributing data to Wikidata. We provide an overview of the functionality of an interactive application for viewing images of the artifacts in the context of their metadata. We contextualize this project as an example of using knowledge graphs in research projects in order to leverage technologies of the Semantic Web in such a way that data related to the project can be easily combined …with other data on the web. Presenting artifacts in this story-based application allows users to explore these objects visually, and provides pathways for further exploration of related information. Show more
Keywords: Wikidata, art history, archaeology, cultural heritage, digital humanities
DOI: 10.3233/SW-243552
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-15, 2024
Authors: Chudasama, Yashrajsinh | Purohit, Disha | Rohde, Philipp D. | Gercke, Julian | Vidal, Maria-Esther
Article Type: Research Article
Abstract: In recent years, knowledge graphs (KGs) have been considered pyramids of interconnected data enriched with semantics for complex decision-making. The potential of KGs and the demand for interpretability of machine learning (ML) models in diverse domains (e.g., healthcare) have gained more attention. The lack of model transparency negatively impacts the understanding and, in consequence, interpretability of the predictions made by a model. Data-driven models should be empowered with the knowledge required to trace down their decisions and the transformations made to the input data to increase model transparency. In this paper, we propose InterpretME, a tool that using KGs, provides …fine-grained representations of trained ML models. An ML model description includes data – (e.g., features’ definition and SHACL validation) and model-based characteristics (e.g., relevant features and interpretations of prediction probabilities and model decisions). InterpretME allows for defining a model’s features over data collected in various formats, e.g., RDF KGs, CSV, and JSON. InterpretME relies on the SHACL schema to validate integrity constraints over the input data. InterpretME traces the steps of data collection, curation, integration, and prediction; it documents the collected metadata in the InterpretME KG. InterpretME is published in GitHub1 1 https://github.com/SDM-TIB/InterpretME and Zenodo2 2 https://doi.org/10.5281/zenodo.8112628 . The InterpretME framework includes a pipeline for enhancing the interpretability of ML models, the InterpretME KG, and an ontology to describe the main characteristics of trained ML models; a PyPI library of InterpretME is also provided3 3 https://pypi.org/project/InterpretME/ . Additionally, a live code4 4 https://github.com/SDM-TIB/InterpretME_Demo , and a video5 5 https://www.youtube.com/watch?v=Bu4lROnY4xg demonstrating InterpretME in several use cases are also available. https://github.com/SDM-TIB/InterpretME https://doi.org/10.5281/zenodo.8112628 https://pypi.org/project/InterpretME/ https://github.com/SDM-TIB/InterpretME_Demo https://www.youtube.com/watch?v=Bu4lROnY4xg Show more
Keywords: Interpretability, knowledge graphs, machine learning models, shacl, ontologies
DOI: 10.3233/SW-233511
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-21, 2024
Authors: Carriero, Valentina Anita | Groth, Paul | Presutti, Valentina
Article Type: Research Article
Abstract: The ontology underlying the Wikidata knowledge graph (KG) has not been formalized. Instead, its semantics emerges bottom-up from the use of its classes and properties. Flexible guidelines and rules have been defined by the Wikidata project for the use of its ontology, however, it is still often difficult to reuse the ontology’s constructs. Based on the assumption that identifying ontology design patterns from a knowledge graph contributes to making its (possibly) implicit ontology emerge, in this paper we present a method for extracting what we term empirical ontology design patterns (EODPs) from a knowledge graph. This method takes as …input a knowledge graph and extracts EODPs as sets of axioms/constraints involving the classes instantiated in the KG. These EODPs include data about the probability of such axioms/constraints happening . We apply our method on two domain-specific portions of Wikidata, addressing the music and art, architecture, and archaeology domains, and we compare the empirical ontology design patterns we extract with the current support present in Wikidata. We show how these patterns can provide guidance for the use of the Wikidata ontology and its potential improvement, and can give insight into the content of (domain-specific portions of) the Wikidata knowledge graph. Show more
Keywords: Ontology design patterns, shapes, knowledge graphs, Wikidata, empirical knowledge engineering
DOI: 10.3233/SW-243613
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-25, 2024
Authors: Aameri, Bahar | Poveda-Villalón, María | Sanfilippo, Emilio M. | Terkaj, Walter
Article Type: Editorial
DOI: 10.3233/SW-243623
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-7, 2024
Authors: Arenas-Guerrero, Julián | Iglesias-Molina, Ana | Chaves-Fraga, David | Garijo, Daniel | Corcho, Oscar | Dimou, Anastasia
Article Type: Research Article
Abstract: RDF-star has been proposed as an extension of RDF to make statements about statements. Libraries and graph stores have started adopting RDF-star, but the generation of RDF-star data remains largely unexplored. To allow generating RDF-star from heterogeneous data, RML-star was proposed as an extension of RML. However, no system has been developed so far that implements the RML-star specification. In this work, we present Morph-KGCstar , which extends the Morph-KGC materialization engine to generate RDF-star datasets. We validate Morph-KGCstar by running test cases derived from the N-Triples-star syntax tests and we apply it to two real-world use …cases from the biomedical and open science domains. We compare the performance of our approach against other RDF-star generation methods (SPARQL-Anything), showing that Morph-KGCstar scales better for large input datasets, but it is slower when processing multiple smaller files. Show more
Keywords: Knowledge graphs, RDF-star, RML-star, data integration
DOI: 10.3233/SW-243602
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-19, 2024
Authors: Azzam, Amr | Polleres, Axel | D. Fernández, Javier | Acosta, Maribel
Article Type: Research Article
Abstract: RDF and SPARQL provide a uniform way to publish and query billions of triples in open knowledge graphs (KGs) on the Web. Yet, provisioning of a fast, reliable, and responsive live querying solution for open KGs is still hardly possible through SPARQL endpoints alone: while such endpoints provide a remarkable performance for single queries, they typically can not cope with highly concurrent query workloads by multiple clients. To mitigate this, the Linked Data Fragments (LDF) framework sparked the design of different alternative low-cost interfaces such as Triple Pattern Fragments (TPF), that partially offload the query processing workload to the client …side. On the downside, such interfaces still come with the expense of unnecessarily high network load due to the necessary transfer of intermediate results to the client, leading to query performance degradation compared with endpoints. To address this problem, in the present work, we investigate alternative interfaces, refining and extending the original TPF idea, which also aims at reducing server-resource consumption, by shipping query-relevant partitions of KGs from the server to the client. To this end, first, we align formal definitions and notations of the original LDF framework to uniformly present existing LDF implements and such “partition-based” LDF approaches. These novel LDF interfaces retrieve, instead of the exact triples matching a particular query pattern, a subset of pre-materialized, compressed, partitions of the original graph, containing all answers to a query pattern, to be further evaluated on the client side. As a concrete representative of partition-based LDF, we present smart-KG + , extending and refining our prior work (In WWW ’20: The Web Conference 2020 (2020 ) 984–994 ACM / IW3C2) in several respects. Our proposed approach is a step forward towards a better-balanced share of the query processing load between clients and servers by shipping graph partitions driven by the structure of RDF graphs to group entities described with the same sets of properties and classes, resulting in significant data transfer reduction. Our experiments demonstrate that the smart-KG + significantly outperforms existing Web SPARQL interfaces on both pre-existing benchmarks for highly concurrent query execution as well as an accustomed query workload inspired by query logs of existing SPARQL endpoints. Show more
Keywords: Knowledge graph, SPARQL, Linked Data Fragments, graph partitioning, availability
DOI: 10.3233/SW-243571
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-45, 2024
Authors: Esteves, Beatriz | Rodríguez-Doncel, Víctor
Article Type: Research Article
Abstract: This article surveys existing vocabularies, ontologies and policy languages that can be used to represent informational items referenced in GDPR rights and obligations, such as the ‘notification of a data breach’, the ‘controller’s identity’ or a ‘DPIA’. Rights and obligations in GDPR are analyzed in terms of information flows between different stakeholders, and a complete collection of 57 different informational items that are mentioned by GDPR is described. 13 privacy-related policy languages and 9 data protection vocabularies and ontologies are studied in relation to this list of informational items. ODRL and LegalRuleML emerge as the languages that can respond positively …to a greater number of the defined comparison criteria if complemented with DPV and GDPRtEXT, since 39 out of the 57 informational items can be modelled. Online supplementary material is provided, including a simple search application and a taxonomy of the identified entities. Show more
Keywords: Privacy policy languages, data protection ontologies, GDPR, rights, obligations
DOI: 10.3233/SW-223009
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-35, 2022
Authors: Ekaputra, Fajar J. | Ekelhart, Andreas | Mayer, Rudolf | Miksa, Tomasz | Šarčević, Tanja | Tsepelakis, Sotirios | Waltersdorfer, Laura
Article Type: Research Article
Abstract: Small and medium-sized organisations face challenges in acquiring, storing and analysing personal data, particularly sensitive data (e.g., data of medical nature), due to data protection regulations, such as the GDPR in the EU, which stipulates high standards in data protection. Consequently, these organisations often refrain from collecting data centrally, which means losing the potential of data analytics and learning from aggregated user data. To enable organisations to leverage the full-potential of the collected personal data, two main technical challenges need to be addressed: (i) organisations must preserve the privacy of individual users and honour their consent, while (ii) being …able to provide data and algorithmic governance, e.g., in the form of audit trails, to increase trust in the result and support reproducibility of the data analysis tasks performed on the collected data. Such an auditable, privacy-preserving data analysis is currently challenging to achieve, as existing methods and tools only offer partial solutions to this problem, e.g., data representation of audit trails and user consent, automatic checking of usage policies or data anonymisation. To the best of our knowledge, there exists no approach providing an integrated architecture for auditable, privacy-preserving data analysis. To address these gaps, as the main contribution of this paper, we propose the WellFort approach, a semantic-enabled architecture for auditable, privacy-preserving data analysis which provides secure storage for users’ sensitive data with explicit consent, and delivers a trusted, auditable analysis environment for executing data analytic processes in a privacy-preserving manner. Additional contributions include the adaptation of Semantic Web technologies as an integral part of the WellFort architecture, and the demonstration of the approach through a feasibility study with a prototype supporting use cases from the medical domain. Our evaluation shows that WellFort enables privacy preserving analysis of data, and collects sufficient information in an automated way to support its auditability at the same time. Show more
Keywords: Provenance, semantic web, privacy-preserving data analysis, auditability, dpv, consent management
DOI: 10.3233/SW-212883
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-34, 2021
Authors: Kurteva, Anelia | Chhetri, Tek Raj | Pandit, Harshvardhan J. | Fensel, Anna
Article Type: Research Article
Abstract: The acceptance of the GDPR legislation in 2018 started a new technological shift towards achieving transparency. GDPR put focus on the concept of informed consent applicable for data processing, which led to an increase of the responsibilities regarding data sharing for both end users and companies. This paper presents a literature survey of existing solutions that use semantic technology for implementing consent. The main focus is on ontologies, how they are used for consent representation and for consent management in combination with other technologies such as blockchain. We also focus on visualisation solutions aimed at improving individuals’ consent comprehension. Finally, …based on the overviewed state of the art we propose best practices for consent implementation. Show more
Keywords: Consent, GDPR, semantic web technology, ontology
DOI: 10.3233/SW-210438
Citation: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-27, 2021
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]