Reconciliation of inconsistent data sources using hidden Markov models

Pankowska, Paulina; Pavlopoulos, Dimitris; Bakker, Bart; Oberski, Daniel L.

doi:10.3233/SJI-190594

Reconciliation of inconsistent data sources using hidden Markov models

Article type: Research Article

Authors: Pankowska, Paulina^{a; *} | Pavlopoulos, Dimitris^a | Bakker, Bart^{a; b} | Oberski, Daniel L.^c

Affiliations: [a] Vrije Universiteit Amsterdam, The Netherlands | [b] Statistics Netherlands, The Netherlands | [c] Utrecht University, University Medical Center Utrecht, The Netherlands

Correspondence: [*] Corresponding author: Paulina Pankowska, Department of Sociology, Faculty of Social Sciences, Vrije Universiteit Amsterdam, de Boelelaan 1105, 1081 HV Amsterdam, The Netherlands. Tel.: +31 20 59 83178; E-mail: [email protected].

Abstract: This paper discusses how National Statistical Institutes (NSI’s) can use hidden Markov models (HMMs) to produce consistent official statistics for categorical, longitudinal variables using inconsistent sources. Two main challenges are addressed: first, the reconciliation of inconsistent sources with multi-indicator HMMs requires linking the sources on the micro level. Such linkage might lead to bias due to linkage error. Second, applying and estimating HMMs regularly is a complicated and expensive procedure. Therefore, it is preferable to use the error parameter estimates as a correction factor for a number of years. However, this might lead to biased structural estimates if measurement error changes over time or if the data collection process changes. Our results on these issues are highly encouraging and imply that the suggested method is appropriate for NSI’s. Specifically, linkage error only leads to (substantial) bias in very extreme scenarios. Moreover, measurement error parameters are largely stable over time if no major changes in the data collection process occur. However, when a substantial change in the data collection process occurs, such as a switch from dependent (DI) to independent (INDI) interviewing, re-using measurement error estimates is not advisable.

Keywords: Data reconciliation, inconsistent data sources, measurement error, linkage error, hidden Markov model, latent class model, dependent interviewing

DOI: 10.3233/SJI-190594

Journal: Statistical Journal of the IAOS, vol. 36, no. 4, pp. 1261-1279, 2020

Published: 25 November 2020

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia