Affiliations: U.S. Census Bureau, USA | National Institute of Statistical Sciences, Suitland, MD, USA | Duke University, Durham, NC, USA
Note: [] Corresponding author: Satkartar K. Kinney, National Institute of
Statistical Sciences, USA. E-mail: [email protected]
Abstract: In most countries, national statistical agencies do not release
establishment-level business microdata, because doing so represents too large a
risk to establishments' confidentiality. Agencies potentially can manage these
risks by releasing synthetic microdata, i.e., individual establishment records
simulated from statistical models designed to mimic the joint distribution of
the underlying observed data. Previously, we used this approach to generate a
public-use version – now available for public use – of the U.S. Census
Bureau's Longitudinal Business Database (LBD), a longitudinal census of
establishments dating back to 1976. While the synthetic LBD has proven to be a
useful product, we now seek to improve and expand it by using new synthesis
models and adding features. This article describes our efforts to create the
second generation of the SynLBD, including synthesis procedures that we believe
could be replicated in other contexts.
Keywords: Synthetic microdata, disclosure avoidance, imputation, data modelling, business data