Special Journal Issue on Fabrication in Survey Research

The Statistical Journal of the IAOS has just released a new issue with a bunch of articles on fabrication in survey research, a subject of great interest for the blog.

Unfortunately, most of the articles are behind a paywall but, thankfully, the overview by Steve Koczela and Fritz Scheuren is open access.  It’s a beautiful piece – short, sweet, wise and accurate.  Please read it.

Here are my comments.

Way back in 1945 the legendary Leo Crespi stressed the importance of what he called “the cheater problem.”  Although he did this in the flagship survey research journal, Public Opinion Quarterly, the topic has never become mainstream in the profession.  Many survey researchers seem to view the topic of fabrication as not really appropriate for polite company, akin to discussing the sexual history of a bride at her wedding.  Of course, this semi taboo is convenient for cheaters.  Maria Konnikova has a great new book about confidence artists.  Much in the book is relevant to the subject of fabrication in survey research but one point really stands out for me; a key reason why the same cons and the same con artists move seamlessly from mark to mark is that each victim is too embarrassed  to publicize his/her victimization.  276365-smiley 4

Discussions of fabrication that have occurred over the years have almost always focused on what is known as curbstoning, i.e., a single interviewer making up data. (The term comes from an image of a guy sitting on a street curb filling out his forms.)  But this is just one type of cheating and one of the great contributions of Koczela and Scheuren’s  journal edition and the impressive series of prior conferences is that have substantially expanded the scope of the survey fabrication field.  Now we discuss fabrication by supervisors, principal investigators and the leaders of a survey companies.  We now know that  hundreds of public opinion surveys, especially surveys conducted in poor countries, are contaminated by widespread duplication and near duplication of single observations.  (This journal issue publishes the key paper on duplication.)

Let me quote a bit from the to-do list of Koczela an Scheuren.

It does not only happen to small research organizations with fewer resources, as was previously believed [12].  Recent instances involve the biggest and most names in the survey research business, academia and the US Government.

This is certainly true but I would add that reticence about naming names is crippling.  Yes, it’s helpful to know that there are many dubious surveys out there but guidance on which ones they are would be very helpful.

An acknowledgement by the research community that data fabrication is a common threat, particularly in remote and dangerous survey environments would allow the community to be cooperative and proactive in preventing, identifying and mitigating the effects of fabrication.

This comment about remote and dangerous survey environments fits perfectly with my critiques of Iraq surveys including this one.

Given the perceived stakes, these discussion often result in legal threats or even legal action of various types.

Ummm….yes.

…the problem of fabrication is fundamentally one of co-evolution.  The more detection and prevention methods evolve, the more fabricators may evolve to stay ahead.  And to the extent we discover and confirm fabrication, we will never know whether we found it all, or caught only the weakest of the pack.  With these truths in mind, more work is needed in developing and testing statistical methods of fabrication detection.  This is made more difficult by the lack of training datasets, a problem prolonged by a general unwillingness to openly discuss data fabrication.

Again, I couldn’t agree more.

Technical countermeasures during fielding are less useful in harder to survey areas, which also happen to be the areas where the incentive to fabricate data is the highest. Many of the recent advances in field quality control processes focus on areas where technical measures such as computer audio recording, GPS, and other mechanisms can be used [6,13].

In remote and dangerous areas, where temptation to fabricate is the highest, technical countermeasures are often sparse [9]. And perversely, these are often the most closely watched international polls, since they often represent the hotspots of American interest and activity. Robbins and Kuriakose show a heavy skew in the presence of duplicate cases in non-OECD countries, potentially a troubling indicator. These polls conducted in remote areas often have direct bearing on policy for the US and other countries. To get a sense of the impact of the polls, a brief review of the recently released Iraq Inquiry, the so-called Chilcot report, contains dozens of documents that refer, in most cases uncritically, to the impact and importance of polls.

To be honest, Koczela and Scheuren do such a great job with their short essay that I’m struggling to add value here.  What they write above is hugely pertinent to all the work I’ve done on surveys in Iraq.

By the way, a response I sometimes get to my critiques of the notorious Burnham et al. survey of deaths in the Iraq war (see, for example, here, here and here) is that it is unreasonable to expect perfection for a survey operating in such a difficult environment.  Fair enough.  But then you have to concede that we cannot expect high-quality results from such a survey either.  If I were to walk in off the street and take Harvard’s PhD qualification exam in physics (I’m assuming they have such a thing….) it would be unreasonable to expect me to do well.  I just haven’t prepared for such an exam.  Fine, but that doesn’t somehow make me an authority on physics.  It just gives me a perfect excuse for not being such an authority.

Finally, Koczela and Scheuren provide a mass of resources that researchers can use to bring themselves to the frontier of the survey fabrication field.  Anyone interested in this subject needs to take a look at these resources.

Check out my New Article at STATS.org

Hello everybody.

Please have a look at this new article that has just gone up on STATS.org.

It is a compact exposition of the evidence of fabrication in public opinion surveys in Iraq as well as the threats and debates flowing from this evidence.

My current plan for the blog is to do one follow up post on some material that was left on the cutting room floor for the STATS.org article and then move on to other stuff….unless circumstances dictate a return to the Iraq polling issue.

Have a great weekend!

More Evidence of Fabrication in D3 Polls in Iraq: Part 2

On Tuesday I provided some eye-popping comparisons on one Iraq survey fielded by D3/KA against another Iraq survey fielded by another company at exactly the same time.  In light of this evidence any reasonable person has to agree that the D3/KA data are fabricated.  Nevertheless, today I give you a different window into the same D3/KA survey.

Recall that one of the main markers of fabrication in these surveys is that the respondents to what I’m calling the “focal supervisors” have too many “empty categories”.  A response category is “empty” for a group of supervisors if it is offered as a possible choice but zero respondents actually chose it.  For example, in Part I to this series we saw that for all public services zero  respondents for the focal supervisors said that the service was “unavailable” or that availability was “very good”.  These are, therefore, both empty categories for the focal supervisors.

Langer Research Associates tried to rationalize all the empties for the focal supervisors by arguing that other supervisors also have empties.  Langer Associates also argued that Steve Koczela and I were unfair to compare the group of focal supervisors with  the group of all the other supervisors.  This is because the number of empties should be decreasing in the total number of interviews and the all-others group did more interviews than the focal group did.  Langer does have a point on this which I addressed in this post.  Here I follow up with a couple of pictures based on the same D3/KA survey discussed on Tuesday.

Each picture takes a bunch of different combinations of supervisors and for each combination plots the number of empties against the number of interviews.  The first plot graphs the data on 100 combinations of three supervisors plus the focals.  The second plot graphs the data on 100 combinations of four supervisors plus the focals.

Empties versus Interviews_three supervisors

Empties versus Interviews_four supervisors

You can see that:

1,  The number of empties is, indeed, decreasing in the number of interviews.

2.  Even after adjusting for this fact the focal supervisors still have overwhelmingly more empties than they should have, given the number of interviews they have conducted.

More Evidence of Fabrication in D3 Polls in Iraq: Part 1

Veteran readers know that I have posted a lot of this subject, including here, here, here, here , here and here.

To recap, a bunch of Iraq polls fielded by D3 Systems and its partner KA Research Limited contain data that appear to be fabricated.  In particular, there is a list of supervisors who consistently preside over non-credible interviews.  Steve Koczela and I dubbed these the “focal supervisors” since we focused our attention on them in our original paper on this subject.

We have known for a long time that D3/KA fielded a large number of surveys in Iraq and that we only had access to a few of them.  This changed recently. when Steve’s Freedom of Information Request to the US State Department came through, providing us with a mass of new Iraq polls.  Some of these were fielded by D3/KA and some were fielded by other companies.  This embarrassment of riches enables all sorts of new tests and comparisons.  I have only scratched the surface of the gold but I can report that lack of credibility of the D3/KA data screams off of the computer screen.

Let’s take a peak at two polls that ask exactly the same questions and were both fielded in April of 2006, one by D3/KA and the other by a company called the Iraq Center for Research and Strategic Studies (ICRSS).

Before looking at some numbers it is worth asking ourselves why the State Department Commissioned two different companies to administer an identical questionnaire simultaneously?  The only reason I can think of is that people in the State Department were suspicious of one of the companies.

In any case, for this short blog post let’s just look at one battery of questions on the availability of various services.  We compare the following two things:

  1.  ICRSS in the regions covered by the focal supervisors  in the comparable D3/KA survey:
  2. The focal supervisors in the D3/KA survey.

Of course, the two surveys should yield roughly the same answers since I hold the zone fixed in the comparisons.

The questions take the following form:

Q3_1:  Please tell me whether the following services for your neighborhood [in the quarter in which you live] over the past month have been very good, good, poor, very poor or not available. … Water supply

The same question is then asked for electricity, telephone service, etc.

Have a scroll through the table below:

Water Supply
Focals ICRSS Survey
Very Good 0 189
Good 0 977
Poor 245 466
Very Poor 198 128
Not Available 0 3
Don’t Know 0 0
NA 0 8
Electricity Supply
Focals ICRSS Survey
Very Good 0 11
Good 0 224
Poor 245 626
Very Poor 198 822
Not Available 0 80
Don’t Know 0 0
NA 0 8
Telephone Service (land line)
Focals ICRSS Survey
Very Good 0 71
Good 0 608
Poor 245 433
Very Poor 198 571
Not Available 0 36
Don’t Know 0 40
NA 0 12
Telephone Service (mobile)
Focals ICRSS Survey
Very Good 0 266
Good 0 1105
Poor 245 185
Very Poor 198 142
Not Available 0 40
Don’t Know 0 21
NA 0 12
Garbage Collection
Focals ICRSS Survey
Very Good 0 57
Good 0 608
Poor 245 667
Very Poor 198 373
Not Available 0 53
Don’t Know 0 0
NA 0 13
Sewage Disposal
Focals ICRSS Survey
Very Good 0 64
Good 0 574
Poor 91 662
Very Poor 352 370
Not Available 0 87
Don’t Know 0 0
NA 0 14
Conditions of Roads
Focals ICRSS Survey
Very Good 0 26
Good 0 532
Poor 148 769
Very Poor 295 388
Not Available 0 39
Don’t Know 0 5
NA 0 12
Traffic Management
Focals Nonfocals
Very Good 0 111
Good 0 834
Poor 245 505
Very Poor 198 207
Not Available 0 58
Don’t Know 0 35
NA 0 21
Police Presence
Focals ICRSS Survey
Very Good 0 255
Good 217 948
Poor 24 390
Very Poor 202 124
Not Available 0 23
Don’t Know 0 10
NA 0 16
Army Presence
Focals ICRSS Survey
Very Good 0 250
Good 217 834
Poor 24 371
Very Poor 202 171
Not Available 0 109
Don’t Know 0 19
NA 0 17

 

This is what your face looks like now:

S1E3_your-favourite-doctor-600x347
What????

 

 

 

 

 

In the D3/KA survey:

  • For six of the ten services exactly 245 rate the availability as “poor” and exactly 198 rate the availability as very “poor”
  • In two of the four cases for which the split is not 245-198 the breakdown is exactly 217-24-202
  • Despite the overwhelming preponderance of answers of “poor” and “very poor” nobody ever answers that a service is “unavailable”.
  • There are zero answers of “very good” and “don’t know.”

The above points easily condemn the D3/KA survey to the dustbin of lies but it’s a piece of cake to come up with more.

  • For four services the most common answer is “good” for ICRSS yet zero people give this answer for D3/KA.
  • ICRSS always has some responses of “unavailable” and “very good” but D3/KA always has zero people giving these answers.

This is not a judgement call.  It is blatantly obvious that the D3/KA data are fabricated.

Sexual Assault, Lies and FOIA

Another great story from Vox.

It is totally worth reading and I don’t want to recapitulate it here.  Instead, I want to focus on some basics that resonate with me.

A top US admiral told congress that US commanders are so zealous in pursuing sexual assault allegations that they forced 93 cases to court martial despite the fact that local civilian prosecutors had refused to pursue these cases.  (You can read the article to discover the context within which this claim was pretty important.)

Some people then made a Freedom of Information request to discover the details of these 93 cases and it turned out that in reality there were….errr….0 such cases.

People do the most amazing things when they think they can keep the actual facts buried.

Fabrication Conference Highlights

The video record of the conference on fabrication in survey research is now up.

Some of the presentations are well worth viewing. But it’s a little difficult to navigate the recording unless you know what to look for.  So allow me to help you out.

Around the 46-minute mark Jennifer Parsons of the UIC Survey Research Lab talks about a health survey they fielded in Chicago, including in some very poor neighbourhoods.

The Lab works hard to motivate their interviewers to do high quality work.  For example, they have clients explain the importance of their studies to their interviewers.  They also explain to interviewers what counts as falsification.  Crucially, falsification includes interviewing people who were not selected into the sample.  Interviewers are not statisticians and many might honestly believe that it’s OK to interview their friends or people who just happen to be available on the street.

The Chicago survey offered financial incentives to respondents  allowing them earn up to $125 from doing enough modules.  Word quickly spread on the street.  People started approaching interviewers pleading to be interviewed.  Three interviewers were obliging to these walk ups, thus crossing the line into falsification.

The consequences for the survey of such sampling violations are potentially severe since the pool of people who are this desperate to be interviewed might well differ substantially from the more general pool the survey wanted to understand.

The main driver here seems to be that $125 is a good chunk of money in a poor neighbourhood.  Another possible factor, highly relevant for surveys conducted in war zones, is that poor neighbourhoods also tend to be dangerous neighbourhoods.  Some interviewers may have been keen to parachute quickly to safety by doing some quick interviews with readily available people.

This episode reminds me of the notorious Burnham et al. survey estimating the number of people killed in the Iraq war.  (See this,  this and this on Burnham et al.).  The interviewers for this survey entered a neighbourhood, somehow gathered together a bunch of children, explained the survey to them and sent these children out to spread the word.  (I know this is hard to believe….).  So lots of people in each visited neighbourhood knew that interviewers were going around asking about war deaths.

I wonder whether some of these interviewers were approached by people demanding to be interviewed about deaths they knew of and whether some of these interlopers pushed their way into the sample.

 

 

Documents on the Court Case on Fabricated Data

Jim James of James Industry Research (JIR) Group has given me permission to post documents from his successful court case over fabricated data.

Unfortunately, due to a technical glitch I didn’t manage to talk to Mr. James today so I’m still a little hazy on the details of his case.  However, the basic shape of the situation seems clear enough.  It is as if I hire you to provide male, Chinese athletes for my focus group but, instead, you deliver a bunch of overweight, elderly women complete with certificates saying they are male, Chinese athletes.

Based on this document and the spreadsheet to be found here the main mystery for me is to explain how a firm that performs so badly can exist in a competitive market environment.   As far as I can tell the contractor would have known that every person delivered JIR would be reinterviewed.  If so, then I can’t grasp how they would have expected their work to hold up.  The only explanation I can think of is that are accustomed incredible tolerance for bad work.

My point is that the occurrence of this incident suggests that it is embedded in a much wider environment of fraud.

Have a look at the documents.  I think they’ll make you laugh out loud.