Affiliations: U.S. Census Bureau, Washington, DC, USA | Johns Hopkins University, Baltimore, MD, USA
Note: [] Corresponding author: Javier Miranda, U.S. Census Bureau, Washington, DC, USA.
Tel.: +1 301 763 6466; Fax: +1 301 763 5935; E-mail: [email protected]
Abstract: National Statistical offices (NSOs) create official statistics from
data collected from survey respondents, government administrative records and
other sources. The raw source data is usually considered to be confidential. In
the case of the U.S. Census Bureau, confidentiality of survey and
administrative records microdata is mandated by statute, and this mandate to
protect confidentiality is often at odds with the needs of users to extract as
much information from the data as possible. Traditional disclosure protection
techniques result in official data products that do not fully utilize the
information content of the underlying microdata. Typically, these products take
the form of simple aggregate tabulations. In a few cases anonymized public-use
micro samples are made available, but these face a growing risk of
re-identification by the increasing amounts of information about individuals
and firms available in the public domain. One approach for overcoming these
risks is to release products based on synthetic data where values are simulated
from statistical models designed to mimic the (joint) distributions of the
underlying microdata. We discuss recent Census Bureau work to develop and
deploy such products. We discuss the benefits and challenges involved with
extending the scope of synthetic data products in official statistics.
Keywords: Confidentiality, synthetic micro data, official statistics