Affiliations: Statistics of Income Division, Internal Revenue Service, Washington, DC 20224-0002, USA
Corresponding author: Barry W. Johnson, Statistics of Income Division, Internal Revenue Service, 1111 Constitution Avenue, NW, (K-Room 4112), Washington, DC 20224-0002, USA. Tel.: +1 202 803 9794; Fax: +1 202 803 9746; E-mail: [email protected]
Abstract: The Statistics of Income (SOI) Division of the U.S. Internal Revenue Service (IRS) uses administrative data from tax and information returns to collect information for its statistical studies. This paper reviews fundamental “Big Data” issues with respect to tax data and highlights initiatives to better leverage administrative tax data for statistical purposes. Among the topics addressed are the current uses of administrative datasets in developing statistical samples and the benefits of large administrative data sets for producing small-area estimates or public-use files while ensuring statistics satisfy applicable data quality standards. New processes to create geographic statistics and public-use data files are also discussed. Also explored are ways to efficiently access large datasets and overcome limitations of filing requirements on statistical uses of administrative data. Finally, future applications of administrative data to produce tax statistics, as well as efforts to improve metadata in support of longitudinal analyses, are examined.
Keywords: Administrative data, Big Data, small-area estimates, geographic statistics, public-use data