Affiliations: [a] SurveyMonkey, Palo Alto, CA, USA | [b] Princeton University, Princeton, NJ, USA | [c] University of Michigan, Michigan, Ann Arbor, MI, USA
Correspondence:
[*]
Corresponding author: Noble Kuriakose, SurveyMonkey, Palo Alto, CA, USA. E-mail:[email protected]
Abstract: Fraud in survey research can take many forms, but a common form is through duplication of valid interviews. Duplication of a valid interview has a number of advantages: expected relationships between the variables will hold across the data set and, if done across a number of interviews, this approach can evade many standard techniques to detect fraud such as straight-lining analysis and the application of Benford's law. In this paper, we consider the likelihood of encountering near duplicates in survey data, suggest methods to fingerprint suspicious observations, report on our analysis of over 1,000 publicly available survey datasets and argue that nearly one in five widely used country-year surveys surveys from major international data sets have exact or near duplicates in excess of 5% of observations.
Keywords: Public opinion surveys, duplication, fraud, data falsification, international surveys, data quality