Affiliations: Italian National Statistical Institute - Istat, Via Cesare Balbo 16, 00184 Rome, Italy. Tel.: +39 06 4673 6372; E-mail: [email protected]
Abstract: The combined use of data from different sources is an
opportunity that the National Statistical Institutes exploit more and more
frequently. In a context where huge amount of information, produced by
different actors, can be integrated and compared, it becomes even more
necessary to provide quality assessments of methods and techniques that have
allowed to achieve integration results. When considering data integration at
the micro level, record linkage procedures are widely used and generally
produce good results (when strong identifying variables are available),
although rarely are these procedures provided with associated quality
indicators. However, especially in official statistics, quality indicators
need to be used in subsequent statistical analyses to guarantee and assess
data accuracy and reliability. This paper proposes a method for linkage
error estimation. The method enriches the Fellegi and Sunter model for
probabistic record linkage: as well known, the Fellegi and Sunter decision
rule is very effective for link identification but generally less reliable
for result evaluation. The proposal aims at predicting the linkage quality
in the Fellegi and Sunter framework, introducing a supervised step.
Keywords: Probabilistic record linkage, linkage errors, linkage quality assessment