You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.
Go to headerGo to navigationGo to searchGo to contentsGo to footer
In content section. Select this link to jump to navigation

Evaluating a new proposal for detecting data falsification in surveys


A recent paper [1] proposed a new detection method for data falsification in surveys called the maximum percent match statistic. The statistic measures the maximum percentage of questions on which each respondent matches any other respondent in the dataset. The authors argue that valid survey data should have few respondents that match on more than 85% of questions. Based on this metric, the authors conclude that 1 in 5 publicly available international surveys contain data that is likely falsified. To evaluate this claim, we tested the sensitivity of the measure to variations in survey characteristics using: simulations on synthetic and survey data; evaluations of high quality domestic and international surveys with little risk of falsification; and regression analysis on 411 of Pew Research Center's international surveys. We find that the presence of high matches in a survey is extremely sensitive to natural, benign survey characteristics, such as the number of questions or number of response options. Our analysis indicates that the proposed metric is prone to generating false positives - suggesting falsification when, in fact, there is none. Thus, we find that the claim of widespread likely falsification based on this measure is not supported.



Kuriakose N., and Robbins M., Falsification in survey research: Detecting near duplicate observations, Statistical Journal of the IAOS Forthcoming 2016.


Groves R., , Fowler F., , Couper M., , Lepkowski J., , Singer E., and Tourangeau R., Survey methodology. Hoboken, N.J.: Wiley; 2009.


Interviewer falsification in survey research: Current best methods for prevention, detection and repair of its effects [Internet]. AAPOR; 2003 Apr 21. Available from: https://www. pdf.


Lyberg L., and Biemer P., , Quality Assurance and Quality Control in Surveys, in: International handbook of survey methodology, de Leeuw E., , Hox J., and Dillman D., eds, New York: Lawrence Erlbaum Associates, 2008.


Lyberg L., and Stukel D.M., , Quality Assurance and Quality Control in Cross-National Comparative Studies, in: Survey methods in multinational, multiregional, and multicultural contexts, Harkness J., , Braun M., , Edwards B., , Johnson T., , Lyberg L., , Mohler P. et al., editors. Hoboken, NJ: Wiley; 2010.


Koch A., Gefälschte Interviews: Ergebnisse der Interviewerkontrolle beim ALLBUS 1994. ZUMA Nachrichten [Internet]. 1995. Available from:


Hood C., and Bushery J., Getting more bang from the reinterview buck: Identifying `at risk' interviewers [Internet]. Proceedings of the American Statistical Association; 1997. Available from: Proceedings/.


Bredl S., , Winker P., and Kötschau K., A statistical approach to detect interviewer falsification of survey data, Survey Methodology [Internet] 38(1) (June 2012), 1-10. Available from: article/11680-eng.pdf.


Diakité S., Statistical methods for the detection of falsified data by interviewers and application survey data in Africa [Internet], Sixth International Conference on Agricultural Statistics; 2013. Available from: CPS005-P7-S.pdf.


Menold N., and Kemper C., How do real and falsified data differ? Psychology of survey response as a source of falsification indicators in face-to-face surveys, International Journal of Public Opinion Research [Internet] 26(1) (2014), 41-65. Available from: 41.abstract?sid=e1af9146-4073-418d-914f-e61f4971fc1b.


Winker P., , Menold N., , Storfinger N., , Kemper C., and Stutkowski S., A Method for ex-post Identification of Falsifications in Survey Data [Internet]. NTTS 2013 - Conferences on New Techniques and Technologies for Statistics; 2013. Available from: NTTS2013fullPaper_93.pdf_en.


Benford F., The law of anomalous numbers, Proceedings of the American Philosophical Society [Internet] 78(4) (31 Mar 1938), 551-572. Available from: 984802.


Bredl S., , Storfinger N., and Menold N., A literature review of methods to detect fabricated survey data. Discussion Paper from Justus Liebig University Giessen, Center for International Development and Environmental Research (ZEU) [Internet]. 2011. Available from: eam/10419/74449/1/746858302.pdf.


Judge G., and Schechter L., Detecting problems in survey data using Benford's law, The Journal of Human Resources [Internet] 44(1) (2009), 1-24. Available from: http://jhr.uwpress. org/content/44/1/1.refs.


The American National Election Studies (ANES), The ANES 2012 Time Series Study [data file]. Stanford University and the University of Michigan: Stanford, CA and Ann Arbor, MI; 2012. Available from:


Tessler M., , Jamal A., , Shteiwi M., , Shikaki K., , Robbins M., , Hamami R. et al., Arab Barometer: Public Opinion Survey Conducted in Lebanon, 2012-2014 [data files]. Ann Arbor, MI; 2015 Nov 31. Available from:


2014 Political Polarization Survey [data file]. Pew Research Center: Washington, DC. Available from:


October 2014 Political Survey [data file]. Pew Research Center: Washington, DC. Available from: http://www.people-


July 2015 Political Survey [data file]. Pew Research Center: Washington, DC. Available from: http://www.people-press. org/category/datasets/?download=20059299.


September 2015 Political Survey [data file]. Pew Research Center: Washington, DC. Available from:


2014 Religious Landscape Study [data file]. Pew Research Center: Washington, DC.


Global Attitudes Survey datasets 2002-2013 [data files]. Pew Research Center: Washington, DC. Available from: http://


Spirit and Power: A 10-Country Survey of Pentecostals, 2006 [data files]. Pew Research Center: Washington, DC. Available from:


The World's Muslims, 2008-2012 [data files]. Pew Research Center: Washington, DC. Available from: http://www.


Religion in Latin America, 2013-2014 [data files]. Pew Research Center: Washington, DC. Available from: http://www.