Abstract: The paper discusses how a statistical office could strike a satisfactory balance between confidentiality protection and freedom of information. Flexible use of statistical data is of vital interest for researchers and for the democratic process. On the other hand, the willingness of respondents to provide data is dependent on the ability of the statistical office to guarantee their anonymity. The paper argues that a combination of measures of different kinds is needed: legal, administrative, methodological, and technical. As long as statistical data are at all collected and statistical results are published, the risks of inadvertent disclosures of information about…identifiable individuals (persons or enterprises) cannot be completely eliminated. On the other hand, the motivation to spend a lot of efforts to break through protection measures is usually low, especially if such efforts are regarded as criminal and can be punished. Moreover, there are often easier ways to find out sensitive information about individuals than by means of malicious processing of statistical data. The paper presents two new ideas that are being launched and discussed in Sweden right now: (i) the idea of transforming commonly known identifiers (of persons and other objects) into pseudoidentifiers by means of a table or an algorithm that is known only by the statistical office; (ii) the idea of a statistical firewall, which filters the queries from users of statistical data as well as the statistical outputs resulting from these queries, thus monitoring the traffic between external users and internal databases containing sensitive statistical microdata. It is discussed in the paper how these two ideas can be used in practice, increasing legitimate usage and improving confidentiality protection at the same time.
Abstract: This paper describes research at the US Census Bureau on the public's perceptions of confidentiality and privacy in the release of statistical data products and new tools to access them. With the use of the Internet to disseminate data, the US Census Bureau recognized that the public would not only have better access to statistical data, they would be more aware of the potential threats to confidentiality in data tabulations and microdata. Some of these threats are real and much research has been done to address them. However, much less is understood about perceptions of threats that may not be…real. The paper reports on possible negative perceptions about data products, research to measure these perceptions, and activities, such as targeted messages, to address them.
Abstract: This paper provides an overview of research into public perceptions of confidentiality. Statistical agencies expend a great deal of time and resources to protect the data that they collect from unauthorized disclosure and identification of individual responses. However, data protection, by its very nature, involves either reducing data quality or limiting access to the very information that statistical agencies go to so much trouble and expense to collect. Many statistical agencies know little about an important input into the data protection decision: the degree to which their own respondents – both businesses and households – understand and believe that statistical…agencies have, in fact, delivered on their confidentiality promises and how such perceptions affect their responses. This paper provides a brief survey of the selected knowledge in the area. Results for the United States suggest that much work needs to be done to further this knowledge, and the paper argues that research into perceptions should be institutionalised by statistical agencies and used to inform data protection decisions.
Abstract: In this paper we will give an overview of the CASC (Computational Aspects of Statistical Confidentiality) project (The CASC-project (see http://neon.vb.cbs.nl/casc) is partly funded under the 5th Research, Technological Development and Demonstration (RTD) Framework Programme of the European Commission. For further information on the 5th RTD-FP, see www.cordis.lu/fp5/home.html.). This project can be seen as a follow up of the SDC (Statistical Disclosure Control) project (The SDC-project was subsidised under the 4th RTD Framework Programme.). However, the main emphasis is more on building practical tools. The further development of the ARGUS-software package for statistical disclosure control will play a central role…in this project. Besides this software development, several research topics have been included in the CASC-project. These research topics, both for the disclosure control of microdata as well as tabular data, aim at obtaining practical results that might be implemented in future version of ARGUS and find its way to the end-users.
Keywords: statistical disclosure control, μ-ARGUS, τ-ARGUS, microdata, tabular data
Abstract: The paper describes how two related software packages can be applied for producing safe data. The package τ-ARGUS is used for tabular data and its twin μ-ARGUS for microdata. The main techniques used to protect sensitive information are global recoding and local suppression. Bona fide researchers who need more information have the possibility to visit Statistics Netherlands and work on-site in a secure area within Statistics Netherlands. Some examples are given of official statistics that have benefited from statistical disclosure control techniques.
Abstract: New IT applications do not only facilitate but also diversify the use of statistics. However, data protection may be an obstacle to dissemination of data. This paper describes some efforts to correspond better to the needs rising from the changing demand especially concerning small area statistics in Finland. The proposals set forth by the personal data protection workgroup for redefined data protection guidelines is presented. In addition, a new data protection method under development is introduced. This method is chiefly based on distribution of detailed geographic information.
Abstract: National statistical institutes routinely apply imputation methods based on statistical models to survey nonresponses. This area of research is very important because it is at the basis of the production of economic data which are as accurate as possible. The idea is to take stock of the experiences gathered in the field of imputation methodology and to try to bridge the gap between this area of research and statistical disclosure limitation. In this paper we review our experiences on model based disclosure limitation techniques. In general, these techniques substitute the observed value of a certain variable with the estimated value…via a statistical model. In particular, we discuss the problems encountered and the possible solutions found with two different models: a regression tree model  for a categorical variable  and a hierarchical model for a continuous variable .
Keywords: business microdata, confidentiality, hierarchical models, regression trees
Abstract: Statistical disclosure control (SDC), also termed inference control two decades ago, is an integral part of data security dealing with the protection of statistical data. The basic problem in SDC is to release data in a way that does not lead to disclosure of individual information (high security) but preserves the informational content as much as possible (low information loss). SDC is dual with data mining in the sense that progress of data mining techniques forces official statistics to continuously improve SDC techniques: the more powerful the inferences that can be made on a released data set, the more protection…is needed so that no inference jeopardizes the privacy of individual respondents' data. This paper deals with the computational complexity of optimal microaggregation, where optimal means yielding minimal information loss for a fixed security level. More specifically, we show that the problem of optimal microaggregation cannot be exactly solved in polynomial time. This result is relevant because it provides theoretical justification for the lack of exact optimal algorithms and for the current use of heuristic approaches.
Abstract: This paper combines the well-known Cell Suppression Methodology (herein called complete cell suppression) and a recently developed method (called partial cell suppression) presented in Fischetti and Salazar . It proposes a unified new methodology with the best features of both and without some disadvantages of the single ones. Hence, this paper presents a new and more powerful cell suppression technique to protect sensitive cells in all kinds of tables. A background on Mathematical Programming guarantees the exact protection of the output against an external intruder while the loss of information is minimised.