Abstract: Distributions of business data are typically much more skewed than
those for household or individual data and public knowledge of the underlying
units is greater. As a results, national statistical offices (NSOs) rarely
release establishment or firm-level business microdata due to the risk to
respondent confidentiality. One potential approach for overcoming these risks
is to release synthetic data where the establishment data are simulated from
statistical models designed to mimic the distributions of the real underlying
microdata. The US Census Bureau's Center for Economic Studies in collaboration
with Duke University, the National Institute of Statistical Sciences, and
Cornell University made available a synthetic public use file for the
Longitudinal Business Database (LBD) comprising more than 20 million records
for all business establishment with paid employees dating back to 1976. The
resulting product, dubbed the SynLBD, was released in 2010 and is the
first-ever comprehensive business microdata set publicly released in the United
States including data on establishments' employment and payroll, birth and
death years, and industrial classification. This paper documents the scope of
projects that have requested and used the SynLBD.
Keywords: Confidentiality, comparative studies, US longitudinal business database, synthetic data