Generating public transport data based on population distributions for RDF benchmarking

Taelman, Ruben; Colpaert, Pieter; Mannens, Erik; Verborgh, Ruben

doi:10.3233/SW-180319

Generating public transport data based on population distributions for RDF benchmarking

Issue title: Special Issue on Benchmarking Linked Data

Guest editors: Axel-Cyrille Ngonga Ngomo, Irini Fundulaki and Anastasia Krithara

Article type: Research Article

Authors: Taelman, Ruben^{; *} | Colpaert, Pieter | Mannens, Erik | Verborgh, Ruben

Affiliations: imec – Ghent University – IDLab, Technologiepark-Zwijnaarde 15, B-9052 Ghent, Belgium. E-mail: [email protected]

Correspondence: [*] Corresponding author. E-mail: [email protected].

Abstract: When benchmarking rdf data management systems such as public transport route planners, system evaluation needs to happen under various realistic circumstances, which requires a wide range of datasets with different properties. Real-world datasets are almost ideal, as they offer these realistic circumstances, but they are often hard to obtain and inflexible for testing. For these reasons, synthetic dataset generators are typically preferred over real-world datasets due to their intrinsic flexibility. Unfortunately, many synthetic dataset that are generated within benchmarks are insufficiently realistic, raising questions about the generalizability of benchmark results to real-world scenarios. In order to benchmark geospatial and temporal rdf data management systems such as route planners with sufficient external validity and depth, we designed podigg, a highly configurable generation algorithm for synthetic public transport datasets with realistic geospatial and temporal characteristics comparable to those of their real-world variants. The algorithm is inspired by real-world public transit network design and scheduling methodologies. This article discusses the design and implementation of podigg and validates the properties of its generated datasets. Our findings show that the generator achieves a sufficient level of realism, based on the existing coherence metric and new metrics we introduce specifically for the public transport domain. Thereby, podigg provides a flexible foundation for benchmarking rdf data management systems with geospatial and temporal data.

Keywords: Public Transport, dataset generator, benchmarking, rdf, linked data

DOI: 10.3233/SW-180319

Journal: Semantic Web, vol. 10, no. 2, pp. 305-328, 2019

Published: 21 January 2019

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia