Calibrated Bayes, an inferential paradigm for official statistics in the era of big data
Abstract
Official statistics is dipping its toe in the ocean of big data, and leaders are emphasizing the need for a major paradigm change. One aspect is the increased volume of data that are not collected on probability samples of the target population. Making full use of these data requires a fundamental change, not only in data collection and dissemination, but also in the methods of statistical inference. The classical ``design-based'' approach to survey inference, developed from the seminal work of Neyman \cite{41}, is simply not applicable to these data. Rather, statistical models are needed that potentially reflect selection bias from the lack of random sampling. I suggest that ``Calibrated Bayes'' is the appropriate statistical paradigm for addressing the analysis. Under this paradigm, inferences for a particular data set are Bayesian, but models are sought that yield inferences with robust repeated sampling properties. Probability sampling remains a powerful tool under this paradigm, since by ensuring that the selection mechanism is ignorable it enhances robust modeling, but it is not essential for the inference. I outline two applications of Calibrated Bayes to data collected by the U.S. Census Bureau.