Applying the LOT Methodology to a Public Bus Transport Ontology aligned with Transmodel: Challenges and Results

Ruckhaus, Edna; Anton-Bravo, Adolfo; Scrocca, Mario; Corcho, Oscar

doi:10.3233/SW-210451

Applying the LOT Methodology to a Public Bus Transport Ontology aligned with Transmodel: Challenges and Results

Issue title: Transport Data on the Web

Guest editors: Marco Comerio, David Chaves-Fraga, Joshua Shinavier and Pieter Colpaert

Article type: Research Article

Authors: Ruckhaus, Edna^{a; *} | Anton-Bravo, Adolfo^a | Scrocca, Mario^b | Corcho, Oscar^a

Affiliations: [a] Ontology Engineering Group, Universidad Politécnica de Madrid, Spain | [b] Cefriel – Politecnico di Milano, Milano, Italy

Correspondence: [*] Corresponding author. E-mail: [email protected].

Keywords: Ontology, Transmodel, public bus, Open Cities, RDF

DOI: 10.3233/SW-210451

Journal: Semantic Web, vol. 14, no. 4, pp. 639-657, 2023

Published: 24 April 2023

Get PDF

Abstract

We present an ontology that describes the domain of Public Transport by bus, which is common in cities around the world. This ontology is aligned to Transmodel, a reference model which is available as a UML specification and which was developed to foster interoperability of data about transport systems across Europe. The alignment with this non-ontological resource required the adaptation of the Linked Open Terms (LOT) methodology, which has been used by our team as the methodological framework for the development of many ontologies used for the publication of open city data. The ontology is structured into three main modules: (1) agencies, operators and the lines that they manage, (2) lines, routes, stops and journey patterns, and (3) planned vehicle journeys with their timetables and service calendars. Besides reusing Transmodel concepts, the ontology also reuses common ontology design patterns from GeoSPARQL and the SOSA ontology. As part of the LOT data-driven validation stage, RDF data has been generated taking as input the GTFS feeds (General Transit Feed Specification) provided by the Madrid public bus transport provider (EMT). Mapping rules from structured data sources to RDF were developed using the RDF Mapping Language (RML) to generate RDF data, and queries corresponding to competency questions were tested.

1.Introduction

Open data initiatives across public administrations worldwide date back to more than a decade ago. In the specific case of Spanish cities, the most relevant landmarks are associated to the first transposition of the EU Public Sector Information directive in 2007,1 1 the publication of the UNE 178301:2015 technical norm on Open Data for Smart Cities,2 2 and the development of the open data guide by the Spanish Federation of Municipalities and Provinces (FEMP) in 2017 [8] and the catalogue of high-value open datasets for cities in 2019 [9].

Domains that have been addressed in these initiatives include public sector, demography, environment, economy, commerce, transport and treasury, among others. As part of the initiatives and projects that have led the advancement of open data among cities in Spain we can cite the Ciudades Abiertas3 3 project, a public-private collaborative project led by four Spanish municipalities (Zaragoza, Madrid, Santiago de Compostela and A Coruña) with the general aim to facilitate the implementation of common Open Government policies that are reusable by many other municipalities inside and outside Spain.

Among the project actions on open data, several (12) ontologies are being developed using the Linked Open Terms (LOT) methodology [5,17]. These ontologies allow publishing Open Data homogeneously across cities, using common CSU structures, as well as following Linked Data principles [29]. They are being added to those that had been already developed in the context of the Spanish network of Open Data for Smart Cities,4 4 and they correspond to a subset of the catalogue of datasets included in the aforementioned FEMP open data guide [9]. All of the ontologies are publicly available and versioned in GitHub,5 5 with the corresponding repositories including use cases and user stories, requirements, the ontology implementation in OWL, the ontology HTML documentation in Spanish and English, and example data and queries.

In the area of transport, three ontologies have been developed so far under the umbrella of these initiatives, focused on the representation of open data about Public Bicycles, Motor Vehicle Traffic and Public Bus Transport. In this paper, we will discuss the latter, an ontology that has been specifically developed for structuring how to publish open data about public buses in cities, beyond the current publication as GTFS feeds (Google Transit Feed Specification) as well as using ad-hoc formats and APIs. We have named it the Public Bus Transport ontology (http://vocab.ciudadesabiertas.es/def/transporte/autobus). Its corresponding GitHub repository with all the intermediate and final artefacts is available at https://github.com/CiudadesAbiertas/vocab-transporte-autobus including a Readme in English.6 6

The scope of this ontology on public bus transportation is focused on the representation of static information related to lines, routes, stops and timetables, and real time information on expected arrival times to bus stops. Having this data on public buses (extensible to other means of transport) is a very valuable element for municipalities and citizens, as well as for third parties. Such data is used for the management of services by transport operators, for the use of these services by citizens, and also for the provision of third-party services in different areas such as traffic management, road infrastructure design and trip planning.

One important assumption that we needed to consider in the development of this ontology was the alignment with European policies, in particular the regulation 2017/1926 [3] that states that starting December 2019, any operator or transport authority must offer its data in formats that are compatible with Transmodel [19], the European reference data model for public transport information developed by the European Committee for Standardization (CEN). Transmodel underpins two concrete data formats that have been established for the exchange of transport data: NeTEx [15] and SIRI [23]. The importance of having the Public Bus ontology aligned to Transmodel lies in the purpose of the standard itself: interoperability among transportation systems of different transport agencies and operators thus facilitating multi modal transport and also the use of transport systems across borders.

This alignment to Transmodel has been fulfilled through the application of the Linked Open Terms (LOT) methodology, a reuse-based methodology specifically focused on developing ontologies and vocabularies for the generation of Linked Data. Initially, in a usual ontology reuse fashion, we tried to map the concepts in our domain with those developed in the Transmodel component ontologies, i.e. ontologies in the SNAP project vocabulary catalog [27].

The European SNAP project [24], where the authors of this paper have been also involved, had already developed an initial ontological transposition of Transmodel in order to facilitate interoperability among these formats and other popular formats such as GTFS. There were drawbacks to this reuse process related to the complexity of the SNAP ontologies and to the fact that they are the result of a partial transposition, i.e. the project focused on the GTFS transposition of some concepts in Transmodel and NeTEx. Therefore, we adapted the reuse process in LOT, in order to consider the non-ontological Transmodel UML specification.

Based on the nature of the concepts defined in Transmodel, we divided our ontological conceptualization into three major modules: (1) Agencies, operators and the lines they manage, (2) Lines, routes, stops, and journey patterns, and (3) Planned vehicle journeys with timetables and service calendars. However, the encoding was done as one single ontology file in order to simplify its publication and future maintenance.

Because of the complexity of the conceptual design, we set ourselves to generate by hand RDF examples of real-world data annotated with the ontology concepts, which could be used for ontology validation purposes. Additionally, as part of the LOT methodology, RDF data were generated automatically taking as input the GTFS feed provided by the Madrid public bus transport provider (Empresa Municipal de Transportes de Madrid). Data transformations were expressed using RML mappings [20], and materialised into RDF, and SPARQL queries corresponding to the competency questions were tested.

The main contributions of this work can be summarized as follows:

1. The development of a Public Bus Transport ontology aligned to the European Reference Data Model, Transmodel.
2. A use case on the reuse of a semantically rich and complex non-ontological resource, the Transmodel UML specification, focusing on all the challenges that this reuse posed.
3. The inclusion of the validation of the ontology through real-world examples in the Implementation stage of the LOT methodology.
4. A set of RML mappings used to generate RDF data, using as source GTFS data published by the Madrid public bus transport provider. These RDF datasets were then used to evaluate the ontology through the SPARQL implementation of the competency questions.

In the next sections we describe related standards and vocabularies for the transportation domain, and preliminaries on the SNAP project and the LOT methodology. Following, we describe our adaptation of the LOT methodology and its application to our use case. The next section describes in detail the three modules of the Public Bus Transport ontology. Then we describe the alignment to Transmodel and the challenges encountered, and finally we give our conclusions and future lines of work.

2.Related work

This section provides an overview of relevant data standards and vocabularies for the representation and exchange of data in the domain of transport.

2.1.Transport data standards

This section describes two well-adopted standards for the representation of transport data, the GTFS standard and Transmodel.

2.1.1.GTFS

The General Transit Feed Specification (GTFS) [16] is a de-facto standard developed by Google and split into a static component, GTFS Static, and a real-time component, GTFS Realtime, that contains arrival predictions, vehicle positions and service advisories. It has become popular due to its simplicity and to the fact that it has been adopted not only by Google Maps, but also by other route planning systems such as Open Trip Planner, or Navita.io.

GTFS Static was developed for sharing the transit static information on agencies, routes and their stops, schedules, fares, among others. The specification defines the headers of seventeen CSV files. In particular, Madrid’s transport authority (CRTM) and Madrid’s public bus operator (EMT) provide a GTFS feed that includes information on all of the entities except for Fares. The most recent data is of April 2020.

2.1.2.Transmodel Data Model

The Transmodel European Reference Data Model for Public Transport, from now on denoted as Transmodel, has been developed in the context of the Directive 2010/40/EU [3] on the framework for the deployment of Intelligent Transport Systems (ITS). In particular the priority action A includes the definition of the necessary requirements to make EU-wide multi-modal travel information services accurate and available across borders to ITS users. This includes the availability of existing and accurate multi-modal transport data, and the facilitation of electronic data exchange between public transport stakeholders.

Transmodel provides a complete and extensive set of related concepts that covers diverse aspects of Public Transport information for different transport modes. The standard covers concepts on eight sub-domains [19]: Network, Timing Information, Vehicle Scheduling, Operational Monitoring and Control, Fare Management, Passenger Information, Driver Management, and Management Information and Statistics. The connections among these sub-domains are shown in Fig. 1.

Fig. 1.

Sub-domains of the Transmodel ontology, their relationships and associated standards.

Our Public Bus Transport ontology covers agencies, operators, lines, routes and their stops, timetables, and arrival times. It does not cover passenger, driver or fare information, nor operational monitoring and control or management statistics. Thus, we will describe the main concepts represented in the Network and in the Vehicle Scheduling, and the Timing Information sub-domains.

The Network sub-domain [26] represents the topological descriptions of the spatial structure of a public transport network which is built with points. An entity Point is defined as the most basic entity of the network model. It marks the location of bus stops, parking places or other types of points. Links represent 1-dimensional connections between points. An ordered set of points or links is called a Link Sequence. These are the generic building blocks of the Public Transport network model. Their specialisations represent concrete special Public Transport objects, like scheduled stop points, routes, journey patterns, among others.

The Timing Information and Vehicle Scheduling subdomains [26], represent a Vehicle Journey to describe the movements of a transport vehicle from the start point to the end point of a journey pattern on an operating day. Among the common concepts that cut across the subdomains are the ones related to organisations [26], as different aspects of public transport could be handled by different organisation parts, and sometimes are subcontracted to third parties.

2.1.3.Transmodel UML specification

The current Transmodel UML specification is the revised V6.0 version (http://www.transmodel-cen.eu/model/index.htm). It has been divided into eight packages, seven of these packages correspond to the sub-domains enumerated above, Timing Information and Vehicle Scheduling have been joined into one package. Besides these packages, there is a Common Concepts package [26] that covers concepts that are shared by the different functional domains. Models in each package are subdivided into more specific sub-models up to three levels. We would like to point out that time-related concepts are represented in the Service Calendar sub-model in the Common Concepts package, in the Tactical Planning Components sub-model in the Network Topology package, and in all of the sub-models in the Timing Information and Vehicle Scheduling package. This gives an idea of the extension and complexity of the specification.

Our Public Bus Transport ontology is aligned to concepts in the Network and the Vehicle Scheduling and Timing Information packages. Additionally, regarding the Common Concepts package, it is aligned to concepts in the Responsibility, Generic Framework, and Reusable Components models.

2.2.Ontologies for the transportation domain

Different efforts in the transportation research literature deal with the definition and possible applications of ontologies in the transportation domain. The survey by Katsumi and Fox [14] describes and compares several ontologies pointing out their commonalities and differences. Despite proposing different approaches in the modelling of domain concepts, all the surveyed works highlight the relevance of ontologies for transportation. Ontologies allow solving the challenges of data integration considering the great number of data sources from many transport stakeholders published in different formats.

Next, we review some of the ontologies proposed in the literature considering the scope addressed by the Public Bus Transport ontology.

A transportation ontology to support the generation of personalized content in travel planning for users is proposed by de Oliveira et al. in [6]. The ontology captures transportation journeys with an ad-hoc model to represent lines, transportation modes, stop points, duration and costs, reusing some concepts from the Transmodel UML specification. The goal of this work is to provide content personalization. It does not cover the ontology development process nor any insight on how the Transmodel concepts were reused.

Benvenuti et al. in [1] define an ontology-based framework to support the monitoring of public transportation services through the representation of knowledge regarding indicators and their formulas, business objectives, dimension analysis and the use of Transmodel modules to compute these indicators. This is a use of Transmodel very different from the one proposed in our work which is the representation of static information related to routes and timetables, and real-time information on expected arrival times.

The Transport Disruption Ontology [4] describes events that can cause disruption on travels, focusing on dynamic data that can provide real-time information to passengers about the status of the service.

Finally, the Linked GTFS vocabulary [2] defines an ontology to represent the entities and relationships described in the GTFS specification. Despite having a scope similar to the one of the Public Bus Transport ontology, the Linked GTFS vocabulary strictly reflects the GTFS specification and it is not aligned with the standards mandated by the European regulation.

To the best of our knowledge, all the ontologies available in the transportation research literature focus on limited scopes. In this context, the Public Bus Transport Ontology and the presented methodology would like to provide the basis for a more general effort towards a comprehensive ontological model for transportation exploiting the standardisation effort made in the definition of Transmodel. Differently from other ontologies for the transport domain, this ontology is the result of the alignment to a standard that is available as a set of UML models together with requirements from experts in the domain.

2.3.Ontologies for Transportation in Smart Cities

The challenge of data integration in the transportation domain is extremely relevant in the context of smart cities where data from different stakeholders need to be aggregated in a seamless way. In this section, we describe related work defining vocabularies for smart cities.

The Vocabulary for Vehicle Traffic7 7 and the Vocabulary for Public Bicycle Sharing Systems8 8 have also been developed within the Open Cities project. Both vocabularies are based on and extend the Sensor, Observation, Sample, and Actuatory ontology (SOSA) [13] ontology. Sensors represent devices that collect traffic measurements and also represent bicycle dock stations. Both vocabularies reuse the GeoSPARQL ontology [10] to represent the “location” concept, e.g. traffic device location, bicycle station location.

The Public Transport Vocabulary9 9 was created in collaboration with transport stakeholders in Madrid for the description of the transport infrastructure domain. This ontology reuses the terminology of the National Public Transport Access Nodes (NaPTAN), a database of public transport access points in Great Britain describing stops as well as transport terminals such as train stations and airports.

Our Public Bus Transport ontology has replicated the reuse of the SOSA and GeoSPARQL ontologies, which has been one of the standard practices for Open Cities ontologies. Besides this, the ontology is related to the Vocabulary for Vehicle Traffic through its relationship to planned and unplanned traffic incidents.

3.Preliminaries

This section describes the initial effort made within the SNAP project for the development of an ontology based on Transmodel and the general stages of the LOT methodology used for our ontology development activities.

3.1.The SNAP project. Semantic National Access Point

The European regulation 2017/1926, requires each European Member State to set up a National Access Point (NAP), for multi-modal travel information for all transport modes. Each transport stakeholder, should contribute to the NAP with their static and dynamic data, using a set of standard data formats identified by the regulation and based on the European Standard Public Transport Reference Data Model, i.e., Transmodel. Specifically, concerning the exchange of static scheduled data, the relevant data in the NAP should use the CEN data exchange standard NeTEx [15]. For the exchange of real-time public transport data, the relevant parts of the CEN public transport data exchange standard SIRI [23] are used.

The SNAP project developed a solution for transport stakeholders that need to convert their data into formats required by the regulation. The proposed solution, based on Semantic Web technologies, implements data conversion, meanwhile, supporting the constitution of a knowledge graph of multi-modal transport data [22]. The SNAP converter adopts a reference ontology as a global conceptual model enabling a two-step conversion between two standards in the transport domain: first, from the input standard to the ontology, and then from the ontology to the target standard. To enable this solution, the SNAP project kick-started an effort to define an ontological transposition of Transmodel,10 10 as the reference ontology used in the conversion process.

As already mentioned, Transmodel is a really large and complex model. The SNAP project, to start validating the proposed solution, implemented a first version of the ontology focusing on a portion of the overall specification. The initial modules of the Transmodel ontology have been defined to enable the conversion of a static GTFS [16] feed, widely adopted among transport stakeholders, to NeTEx, the required standard by the European regulation. Given that NeTEx is almost a serialization of Transmodel, the GTFS specification has been used to identify the relevant portion of Transmodel that should be prioritized in the ontology engineering process.

The Transmodel ontology currently defines five modules, shown in Fig. 2, that cannot be directly mapped on the presented Transmodel sub-domains, but reuse their terminology to define concepts and properties. To obtain a proper ontological transposition, the implemented modules integrate newly defined entities with already available vocabularies (e.g., the Organization ontology,11 11 the Geo WGS84 ontology,12 12 etc.). The Commons module defines general concepts and properties that can be reused across all the other modules. The Organisations module can be used to describe different information about entities operating and/or offering transportation services. The Journeys module can be used to represent data related to a transportation service, e.g., time tables, routes, vehicles and their scheduling, etc.. The Facilities module contains concepts and properties to describe facilities and, in particular, stations and stop places. Finally, the Fares module contains a simplified model for information about fares, that needs to be extended to support the full related Transmodel sub-domain.

Fig. 2.

Initial modules of the Transmodel ontology and their relations.

3.2.LOT methodology

LOT is a lightweight methodology for the development of ontologies and vocabularies [17]. It is based on the previous NeOn methodology and includes the following four major stages that can be seen in Fig. 3: (1) Requirements Specification, (2) Implementation, (3) Publication, and (4) Maintenance.

Fig. 3.

Linked Open Data Methodology (LOT) stages.

The aim of the ontology Requirements Specification stage is to identify and define the requirements the ontology should fulfil [5]. At the beginning of this stage, the goal and scope of the ontology is defined, following this, the domain is analyzed in more detail by looking at documentation, data that has been published, standards, formats, among others. Also, the use cases and user stories are identified. Then, the requirements in form of competency questions and statements are specified and validated by the stakeholders.

The goal of the Implementation stage is to build the ontology using a formal language, based on the ontological requirements identified by the domain experts [5]. This stage is iterative through several sprints and it is comprised of the Conceptualization, Encoding, and Evaluation processes. During the conceptualization process, an ontology model is built and represented in a graphical language. We follow the Chowlk visual notation13 13 that “provides a set of visual blocks to represent each element from the OWL ontology implementation language…”.

One of the activities of the implementation stage is ontology reuse. LOT is based on the NeOn methodology which develops nine scenarios for the development of ontology networks. Among these scenarios there is Scenario 3 on reusing ontological resources. In this scenario, ontology developers reuse ontological resources (ontologies as a whole, ontology modules, and/or ontology statements). Developers search, assess, compare, select, and integrate the ontological resources. Terms that have been extracted from the requirements, that correspond to entities and relationships, may be used for searching existing ontologies that cover these concepts. Scenario 2 considers the reuse of non-ontological resources (NORs), it is meant for the transformation of mostly textual NORs with underlying low expressiveness models such as thesaurus, classification schemes, etc.

In the Encoding process, the ontology development team generates a computable model represented in the OWL language. The Evaluation process includes two aspects: (1) ensuring that the requirements are fulfilled; this is done through the translation of the competency questions into the corresponding SPARQL queries, and executing these queries against RDF test data that has been annotated with terms in the ontology, (2) guaranteeing that the ontology does not have syntactic, modelling or semantic errors. The syntactic validation may be done with any existing OWL validator tool, and the semantic and modelling evaluation is done by running an OWL (DL) reasoner, and by discovering modelling pitfalls. Currently, we use the OOPS! tool [18] for this task.

The aim of the Publication stage is to provide an online human-readable documentation which generally includes metadata (e.g. creators, contributors, creation date, version), a description of the conceptual model diagram, and all of the class and property restrictions and annotation properties. The ontology and all of its associated documents are usually published as a public repository. The Maintenance stage includes updates to ontology requirements that were not originally identified, and improvements, which in consequence may trigger another ontology development iteration.

4.Adaptation of the LOT methodology

The description of the adaptation of the LOT methodology is focused on the ontology Implementation stage, specifically we have adapted the activities of conceptualization, ontology reuse, and evaluation in order to carry out the alignment to Transmodel.

For the Public Bus ontology, the reuse includes ontological as well as non-ontological resources (the Transmodel UML specification, the glossary of Transmodel terms and the NeTEx Schema). We have also extended the Evaluation activity and have included an additional form of evaluation, that is, the validation through RDF real-world examples. The adaptation of the Implementation stage can be seen in Fig. 4.

Although LOT is based on the NeOn methodology, because it is a lightweight methodology it only considers the reuse of ontological resources [5,17]. Besides this, non-ontological resources considered in “Scenario 2: Reusing and Re-engineering Non-Ontological Resources” in the NeOn [25] methodology refer to low-expressiveness resources; in this case there was reuse of a very rich and expressive non-ontological resource.

Fig. 4.

Public Bus Transport Ontology. Adaptation of LOT Implementation Stage. The “Ontology reuse” activity is expanded and the “Evaluation through Examples” has been added to the Evaluation activity.

For the Public Bus Transport ontology, we developed an initial conceptual model without reuse of Transmodel, which satisfied the requirements identified in the first stage of the methodology. Next, in the reuse activity, under the scenario 2 of reuse of ontological resources, we initially tried to map the concepts in the initial conceptualization with those developed in the Transmodel component ontologies, i.e. ontologies in the SNAP Vocabulary catalog [24]. We reused some of the concepts in these component ontologies such as Transmodel Organisations, but were not able to reuse the rest of the ontologies in the SNAP catalogue. The main drawbacks were that some of the component ontologies are quite large and complex because they encompass several of the models in the original UML specification (e.g. the Journeys ontology) and also some classes, attributes and implicit constraints of the UML specification (e.g, cardinality, optionality) were not developed: the SNAP ontologies contain those concepts where there exists a mapping from the GTFS feed specification to Transmodel.

Because of these drawbacks it was necessary to resort to scenario 2 of the NeOn methodology that covers the reuse of non-ontological resources (NORs), in this case the reuse of the UML Transmodel specification. To this effect, the NeOn methodology [25] defines a re-engineering pattern which is comprised of three main activities: (1) NOR reverse engineering to create an abstract representation of the resource, (2) NOR transformation to create the ontological model, and (3) Ontology forward engineering to generate an implementation of an ontology. Although these activities are meant for the transformation of NORs such as thesaurus, classification schemes, lexicons, etc., they can be applied to semantically richer UML specifications. For UML specifications there is already an abstract representation of the resource, so step (1), reverse engineering, is not required. NOR transformation implies translating the UML representation to an ontology representation, and Ontology forward engineering consists in the implementation of the relevant parts of the reused resource. It should be mentioned that although we reused the UML specification, prefixes and entity names that are defined in the Transmodel ontologies were used in our conceptualization.

Additionally, because of the complexity and sometimes contrived conceptual design, we set ourselves to generate examples of real-world data annotated with the ontology. In the LOT methodology these examples are part of the Publication stage. However, they were used during the evaluation activity to validate with the public transport experts if the Transmodel concepts included in our ontology did in fact represent our specific Public Bus Transport domain. This verification triggered some adjustments on the ontology conceptual design.

5.Public Bus Transport Ontology Development

In this section we will describe the application and results of the LOT stages and activities.

5.1.Requirements specification

The requirement specification stage includes the identification of the purpose and scope of the ontology, and the specification of use cases, user stories, and competency questions.

5.1.1.Purpose and scope identification

The Public Bus Transport ontology represents information about the public urban bus service for municipalities in Spain. The requirements cover transport authorities and operators, information on lines, routes, journey patterns and their timetables, stops on each route, information on expected bus arrival times for each stop, and information on planned and unplanned incidents that may affect the bus routes and their journeys.

The ontology’s main stakeholders are: (1) The city’s public bus agencies, operators, and authorities (e.g., Madrid’s “Consorcio Regional de Transportes de Madrid” is the public transport authority that supervises all types of public transportation, and its urban public bus operator, the “Empresa Municipal de Transporte (EMT)”); (2) The citizens, users of the service; (3) The city council who is interested in a good quality service especially in the case of traffic incidents; and (4) civil society or non-governmental organizations who want to analyze service fulfilment.

All of the stages in the ontology development process were carried out with a team of domain experts from public bus operators in the four Spanish municipalities that lead the project, and also people from the public transport area in the city councils themselves.

5.1.2.Use case identification

Requirements identified by all the four municipalities involved in the project together with the domain experts were divided into two major thematic blocks: (1) Information on lines, routes, stops, bus arrival times, time on route forecast; only Madrid provides information on incidents that can affect bus lines and specific routes, but all of the cities deem this information necessary; (2) Travelers and their use of public bus transport; for this block Madrid considers necessary to study the demand of travelers per stop or per line at different times of the day in order to generate indicators and mobility reports.

At this initial point of the requirements stage it was decided to develop the first thematic block. From this block the following use cases14 14 were derived:

1. The bus is a transport mode that reaches most parts of the city and where value-added services can be provided by the operator or third parties. A user requires information on the route(s) and their destination, timetables, and the stops to board and alight and this in turn requires its geolocation, the location of the bus, and the estimated time of arrival. A third party app could compute in real time the best route based on stop arrival times.
2. The quality of the public bus network is related to offering an adequate public bus service, it is essential to know if it is being provided normally or if incidents are affecting it in such a way that corrective actions are needed.
3. Journalists or researchers, non-governmental organizations, among others, may want to analyze if public transport services are being provided “correctly”: if the frequency established for the line and its routes is met, and in general, the degree of fulfillment of the service.

User stories include examples where a transport user needs general information on lines and routes as well as real time information for decision making on a specific trip. Other user stories are related to the analysis of incidents and its relation to the specific lines or routes in order for the operator/agency make decisions regarding information for the user and corrective actions. Finally, there are examples that address problems of isolated stops, stops that may benefit from having certain services and analysis of route options to work districts or areas.

5.1.3.Functional requirements specification

Requirements in the form of statements and competency questions were grouped in sections: transport service, lines, stops, incidents, and buses. It should be mentioned that specific vehicles (buses) and their information, are not part of the scope of this vocabulary. However, expected arrival times of buses at each stop is one of the requirements. Some of the competency questions are related to geospatial data, for example “What are the closest stops to a certain location for a certain line route?”. There is also an English version of the competency questions15 15 that indicates for each question, the corresponding user stories.

5.2.Implementation

The implementation stage includes the activities of ontology conceptualization and reuse, encoding, and evaluation.

5.2.1.Conceptualization and reuse

The conceptualization is aligned with the ontology design principles defined in [28] as follows:

– Clarity. Documentation is provided where all of the classes and properties are described with extensions to the description of these concepts in the Transmodel and SNAP documentation. Examples from the city of Madrid are included in order to make the documentation clearer.
– Coherence. The ontology is coherent in its natural language documentation and also in its logical axioms, as shown by the results of the reasoner execution.
– Extensibility.The ontology has been divided into three major and intrinsically cohesive parts. Extending this ontology with other sub-domains of public bus transport such as Fares is feasible and would require to align it with Transmodel using our LOT adaptation and to interconnect its classes with existing classes in the rest of the ontology.
– Minimal ontological commitment. This guideline is related to extensibility and in this ontology it is followed. The idea is that the representation of the domain should cater to the needs of different users, in our case it is a shared conceptualization among different municipalities in Spain that are representative of cities of different size and also with diverse types of transport requirements. Additionally, whenever there are properties that are not shared by all the cities, these are defined with no minimum cardinality restrictions.
– Minimal encoding bias. The ontology is represented in a graphical notation and shared and discussed with the municipalities and experts before it is encoded.

The color code and graphical notation for the conceptual diagrams as well as the ontology namespaces can be seen in Fig. 5. An initial conceptualization model was developed with reuse of the GeoSPARQL [10] ontology to represent the stop location as a Point, and of the https://schema.org/ Organization class and properties. The model covers roughly the main entities and relationships in the requirements as can be seen in Fig. 6. It should be noted that in case of following a no-reuse approach, this model would have needed further refinement in order to cover aspects of the requirements such as different route patterns, timetables, and expected arrival times.

Fig. 5.

Namespaces and Graphical Notation for the Ontology Conceptual Models.

Fig. 6.

Public Bus Transport Ontology. Initial Conceptual Model.

In a next step we mapped concepts in our domain to those in the following ontologies:

– The GeoSPARQL [10] ontology’s Location pattern to represent the stop location, specifically its Location pattern that has a Feature class, i.e. the entity that has a location, related to Geometry, which in turn is related to the Point class for the representation of the location coordinates.
– Schema.org [21] which is a general vocabulary for structured data on the Web. We reuse its ContactPoint class and properties for contacts in the organizations involved in public bus management, and also several other attributes such as startDate, endDate, name, and url.
– The dcterms vocabulary [7] which is maintained by the Dublin Core Metadata Initiative. Terms identifier and description are reused.
– The SOSA ontology [13] where the property resultTime was reused in order to represent the arrival time of buses as observations related (hasFeatureOfInterest) to the stops where they arrive in each route.

The next activity in this stage involved the reuse of the Transmodel ontologies developed in the SNAP project as well as other reference ontologies. A first attempt to reuse these ontologies resulted in a very large and complex conceptual model. At this point we divided the ontology into three modules that correspond to its sub-domains: (1) Bus organisations and management, (2) Bus routes and journey patterns, and (3) Bus planned vehicle journeys. Each of these modules has a high degree of cohesion, i.e., most of the classes and properties belong together in the same module.

Part I – Bus organisations (Base Model)

This portion of the ontology represents an overview of the organisation and management of public bus transport in cities. This conceptualization covers the following information:

– Bus operators and the authorities they serve to.
– The lines that they manage together with their graphical “presentation’.
– The routes made by each line (journeys and timetables are expanded in the other parts).
– Incidents for which we reuse the Traffic vocabulary16 16 developed in the context of the Ciudades Abiertas project.
– Reuse of SNAP Organisations ontology (tmorg).
– Reuse the Transmodel Line concept, however, a subclass esautob:Line was created to relate the line to the incidents that affect it.

The conceptual model is shown in Fig. 7.

Fig. 7.

Public Bus Transport Ontology. Organisation Conceptual Model (Base Model).

Part II – Bus routes

We followed the steps described in Section 4 for the reuse of the non-ontological Transmodel UML specification: (2) NOR transformation to create the ontological model, and (3) Ontology forward engineering to create an implementation. The ontological graphical representation (step (2)) of parts of the Line Network and Route sub-models of the UML specification can be seen in Fig. 8; this step was necessary to ensure the correct (semantic) reuse of these concepts. We then integrated this partial model into the complete conceptual model (step (3)) that is shown in Fig. 9.

Fig. 8.

NOR Transformation of Transmodel UML Network building blocks: points, points in link sequences and link sequences.

Fig. 9.

Public Bus Transport Ontology. Route, Journey Pattern, Stop Conceptual Model.

The conceptualization of this module includes the following:

– Reuse of the Transmodel Line concept. Again, the subclass esautob:Line is defined in order to relate the line to the stops at the beginning and end of the line.
– A line is made up of several routes, and each route is composed of a series of points on the route, each point in the route is associated to a point that is the functional centroid for a certain place, i.e. the point represents the centre of the bus stop.
– We defined a class for stop, esautob:Stop, as a subclass of Place, due to the need to represent data and object properties that are very specific to this domain.
– We relate the stop to the postal address; we reuse the existing Postal Address ontology.17 17
– The location of the stop is represented through the GeoSPARQL geolocation pattern (this was also part of the initial conceptual model).
– Each route may have several journey patterns, requirements state that stops may vary for example during weekends, and a different journey pattern may have been generated by an incident.
– For expected stop arrival times, we reused the SOSA ontology [13] in order to represent the arrival times as observations.

Note that the Journeys ontology prefix (tmjourney) and its entity names were reused. An example of RDF data for this model where there are two journey patterns is shown in Fig. 10.

Fig. 10.

Example of Lines, Routes and Journey Patterns. There are two journey patterns for route 138a, the second journey pattern was generated by an incident and has changed its first stop.

Part III – Bus vehicle journeys

This part presented in Fig. 11 represents the planned vehicle journeys and its service data (timetables). Similarly to Part II, we represented the corresponding portions of the UML specification in our graphical notation, and then integrated this partial model into the conceptual model. The following information was represented in this module:

– A vehicle journey follows a certain journey pattern and can be made in one or more day types such as a holiday or weekday.
– A service calendar has beginning and ending dates, and each day in the service calendar is associated to a day type. Thus, on a certain date, the information on which vehicle journeys are planned for that date may be extracted from the model.
– Each vehicle journey, because it is frequency-based, is associated to a headway journey group which is determined by minimum, maximum and planned headway intervals.

This module mainly reuses the tmjourney prefix.

Fig. 11.

Public Bus Transport Ontology. Vehicle Journey Conceptual Model.

5.2.2.Encoding

Once the reuse activity was completed, we encoded the ontology using the Protegé tool.18 18

5.2.3.Evaluation

We used the OOPS! tool to evaluate modelling pitfalls. The report is shown in Fig. 12. Results indicate several pitfalls related to properties in reused ontologies that do not define domain and range. Another group of pitfalls indicates that inverse properties have not been declared; again, this is not critical unless it is required for querying the annotated datasets. The complete OOPS! report19 19 is available in the GitHub repository. The Hermit reasoner20 20 was executed determining that the ontology is consistent.

Fig. 12.

Evaluation Report Generated by OOPS!.

Next, we developed a few real-world examples that allowed us to validate if the model is adequate for representing the data in our domain. This was specially important for the concepts of Route, JourneyPattern and VehicleJourney. With these examples we determined the need to simplify the third module on bus vehicle journeys.

Finally, as part of the LOT methodology, directed-by-data evaluation was carried out, i.e., evaluation through semantified data real-world examples that aims to test the ontology against the competency questions. RDF data was produced using (CSV to RDF) RML mappings that were generated with the Mapeathor tool [12], a tool that eases mapping rules creation by using a spreadsheet for the specification. Source data was the GTFS feed provided by the Madrid Regional Transport Consortium. Once the mappings were generated, we constructed several knowledge graphs (KG) using the RDFizer tool [11]. Again the mappings and KG were divided in correspondence with the three modules. Examples of the input to Mapeathor, the mappings, and the KG are presented in Fig. 13. SPARQL queries that correspond to the competency questions were developed. Queries can be tried out through the GitHub repository.21 21 The examples generated by hand to validate the ontology and those generated thorugh mappings are available in the GitHub repository.22 22

Fig. 13.

CSV to RDF mappings specified in Mapeathor.

6.Alignment with Transmodel

In this section we describe the main challenges encountered in the development of the Public Bus Transport ontology and we give details on the alignment with the Transmodel UML specification. Table 1 presents the challenges and solutions.

Table 1

General Challenges to the Development of the Public Bus Transport Ontology

Challenge	Solution
Documentation is scattered and there are different versions available	Compilation of a set of documents to be consulted. Transmodel UML V6 2017 packages is our reference documentation.
Transmodel official documentation is work in progress (last version published on September 2019)	Constant review of documentation which required several iterations to make our ontology more consistent
Extensive information on standards vs. lack of information on implementation or examples	Creation of examples from the very beginning to test the implementations
Complex UML Transmodel specification	Generation of a graphical ontological representation of parts of the UML and integration into the ontology
The same concept with different semantics (Transmodel, NeTEx)	Creation of a consistent glossary based on the examples
Concepts that are not represented in the Transmodel UML specification	Definition of new classes and properties. Subclasses of reused classes when appropriate
Complexity of the resulting Part II that covers bus routes and journey patterns	Division of the conceptual diagram in two sections: (1) Transmodel reused concepts and (2) Public Bus Transport ontology concepts

Globally, challenges are related to taking a UML model and transforming it into an OWL ontology, which although it may seem as a straightforward process it is not so: UML models do not encode all the needed elements to address the domain restrictions and are not necessarily consistent (as it happens in TransModel). Additionally, the reused resources were developed under different perspectives of the domain (Transmodel, NeTEx and SNAP), each with its own documentation, glossary of concepts and implementations. Therefore, when we deal with ontologies that cover specific aspects of this broad domain, we come across with issues like breadth of documentation, over-representation, and ambiguity. Details of the alignment to the UML Transmodel specification follow:

– The UML Public Transport Network Topology package is complex. The hierarchy for the classes LinkSequence, PointInLinkSequence, Point, and its subclasses for a route and journey pattern is scattered across several UML models.
Alignment. A graphical ontological model of these classes using our ontological graphical representation language for a clearer visualisation. Integration into Part II that covers routes and journey patterns. This model was presented in Section 5 under NOR ontology transformation.
It should be noted that cardinality restrictions were implemented as OWL axioms. For example, the PointInLinkSequence is viewedAs exactly one Point implemented as a qualified cardinality restriction in OWL. An RDF Knowledge Graph could be validated against these restrictions using the Shapes Constraint Language (SHACL).23 23
– The UML timing-related information is represented in the Common Concepts, Network Topology, and in the Timing Information and Vehicle Scheduling packages. As the relevant concepts are scattered in several packages, the individual UML model graphical representations did not provide clear information and were not used.
Alignment. A graphical ontological model that represents all of the timing-related classes and properties was created, it was validated through real-world examples realizing that not all of the concepts were needed. A reduced ontological model was integrated into Part III that covers vehicle journeys and their schedules.
– There is no clear Stop class in the UML specification. Several options exist for representing a Stop in Transmodel, e.g. StopPlace, Place.
Alignment. As the stop in our domain is a physical place with certain data and object properties, a class esautob:Stop is defined as a subclass of Place that in turn is a subclass of Zone.
– There is not a clear match of the Transmodel UML specification with the requirement to represent the frequency-based schedule of a journey pattern for a certain type of day. A frequency-based service is represented in the Timing Information and Vehicle Scheduling UML package, specifically in the Frequency Based Service model and it is related to the Vehicle Journey class. However, the representation of an individual vehicle journey is not relevant to our domain.
Alignment. In Part III the VehicleJourney class is related to the HeadwayJourneyGroup that in turn is associated with its HeadwayInterval. The RDF data examples and the generated RDF data used for queries represent one vehicle journey instantiation per journey pattern with its relationships to the frequency-based timetables.
– There is not a clear match of the Transmodel UML specification with the requirement to represent the expected arrival times of buses in a certain stop.
Alignment. Reuse of the SOSA ontology. A stop in a route is a Sensor where for a given timestamp (sosa:resultTime) there is an expected arrival waiting period.
– The Transmodel UML specification defines the Point class as a “A 0-dimensional node of the network used for the spatial description of the network. The ontologies in the Open Cities project reuse the GeoSPARQL location pattern to represent locations of municipality-related “equipment” e.g., buildings, bus stops. This pattern also defines a Point class.
Alignment. Represent in the ontology both Point concepts: tmjourney:Point is needed to represent the relation between points on routes and journey patterns, and physical stop places, and sf:Point represents the geographical location of a stop.

7.Conclusions

In this work we have presented an ontology for the representation of data about public buses operating in cities. This ontology is aligned to the Transmodel reference model. For this development we followed the LOT methodology and adapted the Reuse activity to the scenario of reuse of non-ontological resources, in our case the Transmodel UML specification.

Although the Open Cities project did not require this alignment we considered it as an added value, due to the fact that such an alignment may facilitate the generation of Transmodel-compliant data in the future, as required by the corresponding EU regulation. The ontology development team had also participated in the development of the initial version of the Transmodel ontology in the context of the SNAP project. Therefore, both the complexity of the UML specification and the early state of development of the SNAP ontologies were known beforehand and were an advantage to the development of the alignment.

In order to identify the Transmodel concepts that represented the requirements, we followed a bottom-up approach where we identified the concepts through the Transmodel glossary, and then we built the graphical representation of these concepts by examining portions of the UML models and submodels, and building our conceptualization. This may be a useful experience for other ontology developers in this or other domains who wish to address similar ontology development problems.

Future work includes improving the ontologies originally developed in SNAP with updates to existing concepts from Transmodel, as well as adding other classes and associations from the specification that were not developed in the initial version of those ontologies. Additionally, we suggest dividing the encoding of the Public Bus Transport ontology into the three portions that were conceptualized and presented in this paper.

Notes

1 https://eur-lex.europa.eu/eli/dir/2003/98, https://www.boe.es/eli/es/l/2007/11/16/37/con.

2 https://www.en.aenor.com/normas-y-libros/buscador-de-normas/une?c=N0054318

3 Open Cities in English, http://ciudadesabiertas.es.

4 https://github.com/opencitydata

5 The current catalogue of ontologies is available at http://vocab.ciudadesabiertas.es/.

6 https://github.com/CiudadesAbiertas/vocab-transporte-autobus/blob/master/Readme-en.md

7 http://vocab.ciudadesabiertas.es/def/transporte/trafico

8 http://vocab.ciudadesabiertas.es/def/transporte/bicicleta-publica

9 http://vocab.linkeddata.es/datosabiertos/def/transporte/transporte-publico

10 Published online at http://w3id.org/transmodel.

11 https://www.w3.org/TR/vocab-org/

12 https://www.w3.org/2003/01/geo/

13 https://github.com/oeg-upm/chowlk_spec

14 https://github.com/CiudadesAbiertas/vocab-transporte-autobus/wiki

15 https://docs.google.com/spreadsheets/d/1DdA-Fg3Vau5ihrDUd573RDXzh4pCT6AtUnEjs_tTKFQ/edit?usp=sharing

16 http://vocab.ciudadesabiertas.es/def/transporte/trafico

17 http://vocab.linkeddata.es/datosabiertos/def/urbanismo-infraestructuras/direccion-postal

18 https://protege.stanford.edu/

19 https://github.com/CiudadesAbiertas/vocab-transporte-autobus/tree/master/Publication/OOPSevaluation

20 http://www.hermit-reasoner.com/

21 https://github.com/CiudadesAbiertas/vocab-transporte-autobus/blob/master/Examples/queries.md

22 https://github.com/CiudadesAbiertas/vocab-transporte-autobus/tree/master/Examples/data

23 https://www.w3.org/TR/shacl/

Acknowledgements

We would like to thank all of the members of the Ciudades Abiertas (Open Cities, http://ciudadesabiertas.es/) project who participated in the development of this ontology, including: Honorio Enrique Crespo Díaz-Alejo, María Carmen Ruiz Moreno, María Jesús Gallego San Miguel, María del Mar Arribas de Andrés from the Madrid city council, María Jesús Fernández Ruiz, Víctor Morlán Plo, and José Antonio Chanca Cáceres from the Zaragoza city council, the Servicio de Informática from the Santiago de Compostela city council, the Servicio de Innovación y Desarrollo Tecnológico from the A Coruña city council, Andrés Iglesias Pardo and Andrés Recio Martín from the Empresa Municipal de Transportes de Madrid, and Red.es.

The work presented in this paper is supported by Grant PID2020-118274RB-I00 funded by MCIN/AEI/ 10.13039/501100011033.

References

[1]	F. Benvenuti, C. Diamantini, D. Potena and E. Storti, An ontology-based framework to support performance monitoring in public transport systems, Transportation Research Part C: Emerging Technologies 81: ((2017) ), 188–208, ISSN 0968-090X. doi:10.1016/j.trc.2017.06.001.
[2]	P. Colpaert, A. Llaves, R. Verborgh, O. Corcho, E. Mannens and R. Van de Walle, Intermodal public transit routing using Linked Connections, in: International Semantic Web Conference: Posters and Demos, (2015) , pp. 1–5.
[3]	Commission Delegated Regulation (EU) 2017/1926 of 31 May 2017 supplementing Directive 2010/40/EU of the European Parliament and of the Council with regard to the provision of EU-wide multimodal travel information services, 2017. https://eur-lex.europa.eu/eli/reg_del/2017/1926/oj.
[4]	D. Corsar, M. Markovic, P. Edwards and J.D. Nelson, The transport disruption ontology, in: International Semantic Web Conference, Springer, (2015) , pp. 329–336.
[5]	D2.2 Detailed Specification of the Semantic Model, Specification of the Semantic Model, Technical Report, Universidad Politécnica de Madrid (UPM), 2017. https://vicinity2020.eu/vicinity/content/d22-detailed-specification-semantic-model.
[6]	K.M. De Oliveira, F. Bacha, H. Mnasser and M. Abed, Transportation ontology definition and application for the content personalization of user interfaces, Expert Systems with Applications 40: (8) ((2013) ), 3145–3159. doi:10.1016/j.eswa.2012.12.028.
[7]	Dublin Core Metadata Initiative, 2020. https://www.dublincore.org/specifications/dublin-core/dcmi-terms/.
[8]	FEMP. Federación Española de Municipios y Provincias, DATOS ABIERTOS. Guía estratégica para su puesta en marcha Conjuntos de datos mínimos a publicar (2017). http://femp.femp.es/files/3580-1617-fichero/Gu.
[9]	FEMP. Federación Española de Municipios y Provincias, DATOS ABIERTOS FEMP 2019. 40 conjuntos de datos a publicar por las Entidades Locales, 2019. http://femp.femp.es/files/3580-1938-fichero/DATOS.
[10]	GeoSPARQL Ontology, 2020. https://opengeospatial.github.io/ogc-geosparql/geosparql11/index.html.
[11]	E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana and M. Vidal, SDM-RDFizer: An RML interpreter for the efficient creation of RDF knowledge graphs, in: CIKM ’20: The 29th ACM International Conference on Information and, Knowledge Management, (2020) .
[12]	A. Iglesias-Molina, D. Chaves-Fraga, F. Priyatna and O. Corcho, Towards the Definition of a Language-Independent Mapping Template for Knowledge Graph Creation, 2019.
[13]	K. Janowicz, A. Haller, S.J.D. Cox, D. Le Phuoc and M. Lefrançois, SOSA: A lightweight ontology for sensors, observations, samples, and actuators, Journal of Web Semantics 56: ((2019) ), 1–10. doi:10.1016/j.websem.2018.06.003.
[14]	M. Katsumi and M. Fox, Ontologies for transportation research: A survey, transportation research part C: Emerging, Technologies 89: ((2018) ), 53–82, ISSN 0968-090X. https://www.sciencedirect.com/science/article/pii/S0968090X18300858. doi:10.1016/j.trc.2018.01.023.
[15]	NeTEx, Network Timetable Exchange, 2015. http://netex-cen.eu/.
[16]	G.S. Overview, 2005, https://developers.google.com/transit/gtfs/.
[17]	M. Poveda-Villalón, A reuse-based lightweight method for developing linked data ontologies and vocabularies, in: The Semantic Web: Research and Applications, (2012) , pp. 833–837. doi:10.1007/978-3-642-30284-8_66.
[18]	M. Poveda-Villalón, A. Gómez-Pérez and M.C. Suárez-Figueroa, OOPS! (OntOlogy Pitfall Scanner!): An on-line tool for ontology evaluation, International Journal on Semantic Web and Information Systems (IJSWIS) 10: (2) ((2014) ), 7–34. doi:10.4018/ijswis.2014040102.
[19]	Public Transport Reference Data Model, 2015, http://www.transmodel-cen.eu.
[20]	RDF Mapping Language (RML), 2020. https://rml.io/specs/rml/.
[21]	Schema.org Version 12.0, 2021. https://schema.org/version/latest.
[22]	M. Scrocca, M. Comerio, A. Carenini and I. Celino, Turning transport data to comply with EU standards while enabling a multimodal transport knowledge graph, in: Proceeding of the 19th International Semantic Web Conference, ISWC, Springer International Publishing, (2020) , pp. 411–429. ISBN 978-3-030-62466-8. doi:10.1007/978-3-030-62466-8_26.
[23]	SIRI, Interface for Real-time Information, 2016. http://www.transmodel-cen.eu/standards/siri/.
[24]	SNAP, Your transport data into EU compliance, 2019. https://www.snap-project.eu/.
[25]	M.C. Suárez-Figueroa, A. Gómez-Pérez and M. Fernández-López, The NeOn methodology for ontology engineering, in: Ontology Engineering in a Networked World, M.C. Suárez-Figueroa, A. Gómez-Pérez, E. Motta and A. Gangemi, eds, (2012) , pp. 9–34. doi:10.1007/978-3-642-24794-1_2.
[26]	Transmodel, Common Concepts, Public Transport Network. Timing Information and Vehicle Scheduling, 2017. http://www.transmodel-cen.eu/wp-content/uploads/sites/2/2015/01/TUTORIAL-Part1-3-v0.2.pdf.
[27]	Transmodel Vocabularies, 2019. https://oeg-upm.github.io/snap-docs/.
[28]	M. Uschold and M. Grüninger, Ontologies: Principles, methods and applications, The Knowledge Engineering Review 11: ((1996) ).
[29]	W3C Linked Data, 2006. https://www.w3.org/DesignIssues/LinkedData.