Affiliations: Geographical Methods and Repositories Division, French National Institute of Statistics and Economic Studies (Insee), 88 avenue Verdier, 92120, Montrouge, France | Tel.: +33 6 80 34 20 70; E-mail: [email protected]
Correspondence:
[*]
Corresponding author: Geographical Methods and Repositories Division, French National Institute of Statistics and Economic Studies (Insee)
Abstract: Statistical disclosure control aims neither at revealing the identity of an individual, nor at revealing characteristics of individuals, households or companies that are confidential or personal. Primary statistical secrecy concerns information one can directly assess, whereas secondary statistical secrecy concerns information that a user could deduce indirectly by recombining and crosschecking all the disseminated data. In the case of spatial data disseminated according to several geographical partitions, it is possible to combine and intersect the geographical areas in order to derive information on new and smaller areas. The differencing technique, which consists in subtracting the value of two overlapping areas, can lead to a breach of confidentiality. We have developed a method for dealing with geographical differencing problems by detecting individuals located in small overlapping areas and whose personal information can therefore be disclosed. Modelling the data into a graph structure enables focusing on relevant geographical regions. The originality of the method resides in reducing the graph size and complexity. We applied the method to French income tax data composed of 27 million households dispatched on the 150 000 square cells and on the 35 000 administrative units. The results show that 10 000 households are at risk for disclosure.
Keywords: Geo-differencing, disclosure control, grid data, overlapping areas