Affiliations: Institute for Employment Research, Nürnberg, Germany | Cornell University, Ithaca, NY, USA
Note: [] Corresponding author: Jörg Drechsler, Institute for Employment Research,
Nürnberg, Germany. E-mail: [email protected]
Abstract: One major criticism against the use of synthetic data has been that
the efforts necessary to generate useful synthetic data are so intense that
many statistical agencies cannot afford them. We argue many lessons in this
evolving field have been learned in the early years of synthetic data
generation, and can be used in the development of new synthetic data products,
considerably reducing the required investments. The final goal of the project
described in this paper will be to evaluate whether synthetic data algorithms
developed in the U.S. to generate a synthetic version of the Longitudinal
Business Database (LBD) can easily be transferred to generate a similar data
product for other countries. We construct a German data product with
information comparable to the LBD – the German Longitudinal Business Database
(GLBD) – that is generated from different administrative sources at the
Institute for Employment Research, Germany. In a future step, the algorithms
developed for the synthesis of the LBD will be applied to the GLBD. Extensive
evaluations will illustrate whether the algorithms provide useful synthetic
data without further adjustment. The ultimate goal of the project is to provide
access to multiple synthetic datasets similar to the SynLBD at Cornell to
enable comparative studies between countries. The Synthetic GLBD is a first step towards that goal.
Keywords: Confidentiality, comparative studies, German Longitudinal Business Database, synthetic data