Secret Data Sunday – Gary Langer Edition

Last Sunday I shared an unanswered email I had sent to the Senior Vice President for Editorial Quality at ABC news.  The email gives a self-contained account of the overall context behind my data request, but I’ll take another pass here just to be as clear as possible.

There were a remarkable number of opinion polls conducted in Iraq during the US occupation.  Many of these were fielded by D3 Systems working with KA Research Limited.  Steve Koczela and I analyzed some of these surveys and found extensive evidence of fabricated data.  We wrote up our findings and asked for comments from interested parties.  D3 and Langer Research Associates then threatened to sue us rather than constructively engaging.  (See this, this and this.)

It’s clear that Langer Research Associates reacted so furiously because Gary Langer did a series of D3-KA Iraq polls for ABC  that won an Emmy Award plus the Policy Impact Award from the American Association for Public Opinion Research.  So he has a lot at stake.

Moreover, the write ups of these ABC polls show that the ABC data display some of the same patterns that Steve and I found in other D3-KA-Iraq polls.  One of the big ones is  opinion unanimity in certain governorates, including Anbar, that is more characteristic of robots than it is of human beings.  With this in mind, check out the highlighted text below.

^2284C743C86CC164FCB2B2EF819738398CF6E4E396A18B028B^pimgpsh_fullsize_distr

^E277C881426EB61DB031A34F3791226CA4761A05985A3642E9^pimgpsh_fullsize_distr

Given this background it is, perhaps, not surprising that D3 and Langer went for a legal choke-slam rather than for serious discussion.  Nevertheless, it is disappointing that these research organizations place so little value on the truth.  Thus, there really must be an outside examination of the micro data from ABC’s public opinion polling in Iraq.

I requested the data from Mathew Warshaw of D3 Systems.  He directed me to ABC News.  But, as we know, ABC News ignored my data request.  I also tried Gary Langer who  ignored me at first but finally wrote back on my latest attempt.

This is what I wrote to Langer.

Gary,

This is an opportune moment to renew my data request for the surveys you conducted in Iraq using D3 Systems and KA Research Limited.  You did not reply to my last request.

You abdigate your responsibility to the truth and violate principles of transparency by hiding your data and trying to shut down discussion of your work.

Mike Spagat

This is his reply.

Jeez, you really know how to sweet talk a guy, don’t you?

Extra points for “abdigate.”

OK, I accept full responsibility for misspelling abdicate…..abdicate, abdicate, abdicate, abdigate  gah! dammit….

I’m less apologetic about not being sweeter about my request.  Maybe being sweet is better than not being sweet but, in the end, he should live up to his responsibilities whether or not people talk to him sweetly.

Strangely this isn’t the end of the story but you’ll have to come back next Sunday for more.

Secret Data Sunday – ABC News (in the US) Stonewalls over their Dubious Iraq Public Opinion Polls

Below is an email that I sent to Kerry Smith, the Senior Vice President for Editorial Quality at ABC news, back in November of 2016.

She did not reply..

 

Dear Ms. Smith,

I am a professor of economics specialized in the quantitative analysis of armed conflict.  I have a big body of work focused on data quality issues that arise during data collection in conflict zones, especially survey data.

Back in 2011 I wrote a paper with Steven Koczela, now a prominent pollster with MassINC Polling, that uncovered substantial evidence of fabricated data in polls fielded in Iraq by D3 Systems.  We sent our paper to various interested parties for comments, including Mathew Warshaw of D3 Systems and Gary Langer who had just moved from ABC to found Langer Associates.  We included Mr. Langer in the circulation list because ABC news had used D3 Systems for a series of polls in Iraq that now required urgent re-evaluation.

D3, backed by Langer Associates, responded by threatening to sue me and Mr. Koczela.  See this, this and this.   My university has supported me against this censorship attempt but, unfortunately, Mr. Koczela felt that he could not defend himself and signed an agreement to keep his mouth shut about this particular piece of work.  (This why only my name appears on the first link above.)  Eventually, the legal threat disappeared when I wrote to Mr. Warshaw asking him explain what, specifically, he objected to in our analysis.  He did not reply.

To his credit Mr. Koczela continued working on this issue, unearthing a large number of datasets for opinion polls conducted in Iraq by D3 Systems and other polling companies.  These have provided remarkably strong evidence of data fabrication already.  For example, see this eye-popping analysis.

Many of the D3 Iraq surveys that I now have were conducted for the US State Department.  Mr. Koczela made the State Department aware of the problem at some point and they hired Fritz Scheuren, a former president of the American Statistical Association to investigate.  His analysis confirmed the fabrication problem using an analysis rather different from mine.  Unfortunately, Dr. Scheuren signed a nondisclosure agreement but I believe he would confirm in general terms the main gist of this work and he could also give you an authoritative opinion on my analysis.  (scheuren@aol.com)

Notice that after the Huffington Post article Langer Associates did post a response to my 2011 paper.   This is, however, exceptionally weak as I explain in these articles.  Langer Associates have not addressed the new evidence that has emerged since Mr Koczela’s FOIA either.

I emailed Mr. Langer for the data from the ABC Iraq polls but he did not reply.  I asked Mr. Warshaw for the same data and he referred me to ABC news.  I am now requesting the data from you.

 At the risk of belabouring the obvious, I note that people with strong intellectual cases to make do not start by threatening to sue and finish by withholding their data.

Most importantly, ABC needs to take action to correct the historical record of the Iraq war.  These polling numbers are all over the web sites of ABC news and its partner organizations in these polls.  This work must be retracted.

It is, of course, your journalistic obligation to correct the historical record but, at the same time, I think it’s to your advantage to do so.  Fixing this problem would demonstrate a strong commitment to quality and accuracy.  I doubt you would even lose your Emmy Award.  Surely you won’t be punished for pursuing the truth wherever it leads.  I will do anything I can to help in this regard.

I suggest that we meet to discuss these issues further.  I would be happy to fly to New York at my own expense for this purpose.  Alternatively, we could talk by phone, skype or some other technology.

Sincerely,

 

Professor Michael Spagat

Head of Department

Department of Economics

Royal Holloway College

University of London

Egham, Surrey TW20 0EX

United Kingdom

m.spagat@rhul.ac.uk

+44 1784 414001 (W)

+44 1784 439534 (F)

 

Blog:  https://mikespagat.wordpress.com/

War, Numbers and Human Losses: The Truth Counts

Special Journal Issue on Fabrication in Survey Research

The Statistical Journal of the IAOS has just released a new issue with a bunch of articles on fabrication in survey research, a subject of great interest for the blog.

Unfortunately, most of the articles are behind a paywall but, thankfully, the overview by Steve Koczela and Fritz Scheuren is open access.  It’s a beautiful piece – short, sweet, wise and accurate.  Please read it.

Here are my comments.

Way back in 1945 the legendary Leo Crespi stressed the importance of what he called “the cheater problem.”  Although he did this in the flagship survey research journal, Public Opinion Quarterly, the topic has never become mainstream in the profession.  Many survey researchers seem to view the topic of fabrication as not really appropriate for polite company, akin to discussing the sexual history of a bride at her wedding.  Of course, this semi taboo is convenient for cheaters.  Maria Konnikova has a great new book about confidence artists.  Much in the book is relevant to the subject of fabrication in survey research but one point really stands out for me; a key reason why the same cons and the same con artists move seamlessly from mark to mark is that each victim is too embarrassed  to publicize his/her victimization.  276365-smiley 4

Discussions of fabrication that have occurred over the years have almost always focused on what is known as curbstoning, i.e., a single interviewer making up data. (The term comes from an image of a guy sitting on a street curb filling out his forms.)  But this is just one type of cheating and one of the great contributions of Koczela and Scheuren’s  journal edition and the impressive series of prior conferences is that have substantially expanded the scope of the survey fabrication field.  Now we discuss fabrication by supervisors, principal investigators and the leaders of a survey companies.  We now know that  hundreds of public opinion surveys, especially surveys conducted in poor countries, are contaminated by widespread duplication and near duplication of single observations.  (This journal issue publishes the key paper on duplication.)

Let me quote a bit from the to-do list of Koczela an Scheuren.

It does not only happen to small research organizations with fewer resources, as was previously believed [12].  Recent instances involve the biggest and most names in the survey research business, academia and the US Government.

This is certainly true but I would add that reticence about naming names is crippling.  Yes, it’s helpful to know that there are many dubious surveys out there but guidance on which ones they are would be very helpful.

An acknowledgement by the research community that data fabrication is a common threat, particularly in remote and dangerous survey environments would allow the community to be cooperative and proactive in preventing, identifying and mitigating the effects of fabrication.

This comment about remote and dangerous survey environments fits perfectly with my critiques of Iraq surveys including this one.

Given the perceived stakes, these discussion often result in legal threats or even legal action of various types.

Ummm….yes.

…the problem of fabrication is fundamentally one of co-evolution.  The more detection and prevention methods evolve, the more fabricators may evolve to stay ahead.  And to the extent we discover and confirm fabrication, we will never know whether we found it all, or caught only the weakest of the pack.  With these truths in mind, more work is needed in developing and testing statistical methods of fabrication detection.  This is made more difficult by the lack of training datasets, a problem prolonged by a general unwillingness to openly discuss data fabrication.

Again, I couldn’t agree more.

Technical countermeasures during fielding are less useful in harder to survey areas, which also happen to be the areas where the incentive to fabricate data is the highest. Many of the recent advances in field quality control processes focus on areas where technical measures such as computer audio recording, GPS, and other mechanisms can be used [6,13].

In remote and dangerous areas, where temptation to fabricate is the highest, technical countermeasures are often sparse [9]. And perversely, these are often the most closely watched international polls, since they often represent the hotspots of American interest and activity. Robbins and Kuriakose show a heavy skew in the presence of duplicate cases in non-OECD countries, potentially a troubling indicator. These polls conducted in remote areas often have direct bearing on policy for the US and other countries. To get a sense of the impact of the polls, a brief review of the recently released Iraq Inquiry, the so-called Chilcot report, contains dozens of documents that refer, in most cases uncritically, to the impact and importance of polls.

To be honest, Koczela and Scheuren do such a great job with their short essay that I’m struggling to add value here.  What they write above is hugely pertinent to all the work I’ve done on surveys in Iraq.

By the way, a response I sometimes get to my critiques of the notorious Burnham et al. survey of deaths in the Iraq war (see, for example, here, here and here) is that it is unreasonable to expect perfection for a survey operating in such a difficult environment.  Fair enough.  But then you have to concede that we cannot expect high-quality results from such a survey either.  If I were to walk in off the street and take Harvard’s PhD qualification exam in physics (I’m assuming they have such a thing….) it would be unreasonable to expect me to do well.  I just haven’t prepared for such an exam.  Fine, but that doesn’t somehow make me an authority on physics.  It just gives me a perfect excuse for not being such an authority.

Finally, Koczela and Scheuren provide a mass of resources that researchers can use to bring themselves to the frontier of the survey fabrication field.  Anyone interested in this subject needs to take a look at these resources.

Dispute Resolution by Mutual Maiming

I’m puzzled by the following sequence of events.  (This story has a very clear summary.)

  1. The UN issues a report entitled “Children and Armed Conflict”.  The report highlights quite a few groups for committing grave abuses against children.  The “Saudi Arabia-led Coalition”  in the war in Yemen is on this UN blacklist.  The report fingers the Coalition for killing and maiming children and for attacking hospitals and schools.  (So far I’m not puzzled.)
  2. UN Secretary General Ban Ki Moon then announces that he is caving in to pressure and will remove Saudi Arabia from the UN blacklist:

“The report describes horrors no child should have to face,” Ban said at a press conference. “At the same time, I also had to consider the very real prospect that millions of other children would suffer grievously if, as was suggested to me, countries would defund many U.N. programs. Children already at risk in Palestine, South Sudan, Syria, Yemen, and so many other places would fall further into despair.”  (The quote is from the same summary story mentioned above.)

Moon stops just short of directly naming his blackmailer but it’s obviously  Saudi Arabia.

Of course, this story is sad and pathetic.  It would be nice to live in a world in which the UN can at least speak the truth and exert moral suasion upon belligerent parties to clean up their acts even if the UN cannot force good behaviour.  Unfortunately, we do not really live in this world.

But here’s the puzzle.  Why do the Saudis think they have accomplished something with their bullying censorship?

Saudi Arabia was named in an obscure report that is read by only a handful of specialists.  Suddenly the report is famous.   What’s the take-home message for people outside the Saudi inner circle?  Is it that the UN screwed up by naming the Saudi-led Coalition but that this mistake has now been corrected and the Saudis are finally getting the respect they deserve?  I don’t think so.

It’s as if a rape victim names her rapist but then recants, saying that he threatened to kill her unless she did so – the rapist then breaths of sigh of relief now that his good name has been cleared.

The only way I can make sense of the Saudi behaviour is to think of it as just a single  move in a long game.   This time Saudi Arabia elevates a black-hole report to a major news item spiced up by Saudi blackmail.

But next time the UN will think twice before embarrassing the Saudis.

Maybe it makes sense that way.  But if so then we should always assume that the Saudis are behaving much worse than the self-censoring UN says they are.

PS (Two hours after posting) – Looking at this again I realize that my title is a little weird.  This is because I started with the title but then the ideas drifted while I wrote and by the end the connection between the post and the title became obscure.

For the record, the idea is that the dispute resolution harmed both parties.   Saudi Arabia comes off as a bully and blackmailer in addition to the original charge of abusing children.  The UN demonstrates that it can’t be trusted to speak the truth.  So, at least in the short run, both sides are damaged by the dispute.

Check out my New Article at STATS.org

Hello everybody.

Please have a look at this new article that has just gone up on STATS.org.

It is a compact exposition of the evidence of fabrication in public opinion surveys in Iraq as well as the threats and debates flowing from this evidence.

My current plan for the blog is to do one follow up post on some material that was left on the cutting room floor for the STATS.org article and then move on to other stuff….unless circumstances dictate a return to the Iraq polling issue.

Have a great weekend!

Langer Research Associates Responds: Part IV

This continues the stream of posts beginning here and continuing through here, here, here and here.

Today I had wanted to write on duplicates in the D3/KA Iraq surveys but I’ve hit a little snag in the analysis so I will postpone this subject for the near future.

Instead, today I’ll cover empty categories, that is, answer choices that are offered to respondents but that nobody among some broad class of respondents actually picks.

We stressed these empty categories in our original paper, finding a number of questions for which all respondents to our flagged supervisors failed to give certain answers that at least some respondents for other supervisor did give.

Yesterday’s discussion of duplicates is actually relevant for understanding why having so many empty categories is suspicious.  Peoples’ opinions are not cloned from one another.  Moreover, there is randomness in how people respond to questions and how these responses are recorded.  So we would expect a lot of natural variation in real answers to real survey questions.  We would not expect all responses to converge on just a few categories.

Quick note – today I will merge together two things that we held separate in the original paper.  Back then we had a section on substantive responses such as how much people approved of Prime Minister Maliki or whether or not people owned shortwave radios.  Then we had a another section on the responses “don’t know” and “refused to answer”.  Here I simplify things by treating the two types of missing categories as interchangeable.

The Exhaustive Review made a good point on missing categories.  We always split our sample in two: the interviews conducted by the flagged supervisors (we called them “focal supervisors” in the paper) and the interviews conducted by all the other supervisors.  This method of splitting means that there were always two to three times as many interviews in the unflagged category than there were were in the flagged category.  So maybe the excess of empty categories for the flagged supervisors is just because of the lower number of interviews they conducted.  In particular, the Exhaustive Review points out that when you group interviews by single supervisors, rather than by groups of supervisors as we did, you see that many supervisors have empty categories, not just the flagged supervisors.

This is definitely something that merits further investigation which I’m still doing.  However, I can report that a clear pattern has already emerged.  Once you adjust for the numbers of interviews the flagged supervisors tend to produce roughly two to four times the number of empty categories as the other supervisors do.

For example, in the January 2006 PIPA survey our flagged supervisors have a total of 240 empty categories in 332 interviews.  Two nonoverlapping combinations of other supervisors with 320 and 322 interviews had 110 and 122 missing categories, respectively.  I did find a single supervisor who had 316 missing categories…. but on only  66 interviews.

The results are similar for other surveys.  More interviews do tend to reduce the count of missing categories but the flagged supervisors consistently rack up 2 to 4 times their share of empties relative to interview counts.

So the Exhaustive Review has made a useful point that helps to improve the analysis of these surveys.  I just wish they had made the point openly back in 2011.  And this extension of the original approach does not weaken the evidence for fabricated data in the surveys.

Langer Research Associates Responds: Part III

This post is a continuation of this one and this one with further links to be found in the first two.

I’ll start with an important announcement.  Steve Koczela just had success with a Freedom of Information Request to the US State Department.  This means that he now has a mountain of new polling data from Iraq which he will be releasing in due course.

Some of these surveys were fielded by D3/KA, giving us a great chance to test our findings out of sample.  On top of that there are some surveys fielded by another company which provides an even better opportunity to get to the bottom of what has been going on in these polls.

I couldn’t resist having a look today at a D3/KA survey from 2006.

The survey has a battery of questions on the quality of public services.  I give the questions at the bottom of this post.  The possible answers are: “very good”, “good”, “poor”, “very poor”, “not available” and “don’t know”. Based on previous work I predict that supervisors 36, 43 and 44 are cheaters.  So I divide the sample into two pieces: the interviews of these supervisors and the interviews of all the other supervisors.

For the ones I predict for cheating the most common answer on these questions is that services are “very poor”. Not a single person says that services are “very good” or that they “don’t know”.  This is strange.  You’d expect that at least one of the out of 443 would go for one of these answers but let’s leave that aside.  Maybe these people are all very sure that they are receiving bad services.

Much more surprising is that not a single person says that a service is “not available”.  So,overwhelmingly, services are very bad or bad … but still available.  This is weird.  Don’t you think that at least a few of these dissatisfied customers would tick the worst box of all?

All boxes get ticked for the group of the other supervisors.  These supervisors did do 1,557 interviews so you could follow the Exhaustive Review and say their fuller coverage is down to their higher numbers.  In a future post I will explain why I am not convinced on this point.  But let’s leave this aside as well for today.

Instead, let’s look at correlations between answers to different questions.  For example, to what extent are people who are happy with their trash collection also happy with their landline service, etc.?

Here’s a list of the correlations on this battery of questions.  On the left are the interviews of the predicted cheaters and on the right are the interviews or all the others.

Predicted Cheaters All the Others
1.00 0.35
1.00 0.04
1.00 0.05
1.00 0.10
1.00 0.05
1.00 0.26
1.00 0.16
1.00 0.10
1.00 0.04
1.00 0.04
0.46 0.16
0.46 0.11
0.46 0.03
0.46 0.03
0.46 0.97
0.64 0.37
0.64 0.27
0.64 0.10
0.64 0.08
0.64 0.19
0.61 0.20
1.00 0.14
1.00 0.08
1.00 0.01
1.00 0.03
1.00 0.50
0.46 0.50
0.64 0.15
0.33 0.12
0.33 0.08
0.33 0.02
0.33 0.01
0.33 0.67
-0.04 0.66
0.24 0.15
0.33 0.68
0.33 0.06
0.33 0.03
0.33 0.00
0.33 0.00
0.33 0.05
-0.04 0.04
0.24 0.10
0.33 0.04
1.00 0.07

Look at all the perfect correlations of 1.00 for the predicted cheaters!  

Every time you see a 1.00 you should hear the sound of 443 people answering questions in perfect lock step with one another.  If you are slightly happier with your electricity than I am then you are also slightly happier with your water than I am…and also slightly happier about your landline…and slightly happier with your mobile, and your garbage collection….and traffic management in your area.

I didn’t make that up.  All of the above variables are perfectly correlated.  C’mon guys.  You’re making yourselves too easy to catch.

For the supervisors not flagged in advance as likely cheaters there is never a perfect correlation between two questions.  This is what we would expect in real interviews.

Your eye may have been drawn toward the very high correlation of 0.97 for the supervisors I haven’t flagged as suspicious.  But this is for garbage collection versus sewage disposal.  In fact, it makes sense that these two would be closely linked and the much weaker connection for the likely cheaters strikes me as further evidence that they made up their data.

Quoting again from the Exhaustive Review:

Examining expected correlations is a reasonable way to search for evidence of data fabrication; it’s very difficult for a fabricator to anticipate relationships among variables and fake data accordingly. We find, however, that the lack of correlations of the type that Koczela and Spagat document appears again to be an artifact of their groupings of supervisors. (We also note that we have examined many more correlations, 96 in total, than Koczela and Spagat report.)

I have to agree with the exhaustive reviewers here.  Looking at correlations is quite a good way to uncover fabrication. Indeed, the above table is strong evidence of fabrication.  However, I’m baffled by how results like these are supposed to be “artifacts” of groupings.  I honestly don’t know what to make of this comment.

Of course, the above table only contains 45 correlations.  With the five reported in the original paper, which the exhaustive reviewers did not attempt to explain, I’m still 46 shy of the exhaustive reviewers.  I guess I’ll have to work harder.

Remember that all the analysis in this post is of a new survey not covered in my original paper.  I was able to use the list of suspicious supervisors taken from the earlier paper to immediately find big correlation, and other, anomalies in a new dataset.  In other words, this is an out-of-sample success, and it is an easy one at that.

Finally, here is the list of questions in the battery:

Q3a-The following services for your neighborhood over the past month have been…Water Supply?
Q3b-The following services for your neighborhood over the past month have been…Electric Supply?
Q3c-The following services for your neighborhood over the past month have been…Telephone Service (land line)?
Q3d-The following services for your neighborhood over the past month have been…Telephone Service (Mobile)?
Q3e-The following services for your neighborhood over the past month have been…Garbage Collection?
Q3f-The following services for your neighborhood over the past month have been…Sewage Disposal?
Q3g-The following services for your neighborhood over the past month have been…Conditions of roads?
Q3h-The following services for your neighborhood over the past month have been…Traffic Management?
Q3i-The following services for your neighborhood over the past month have been…Police Presence?
Q3j-The following services for your neighborhood over the past month have been…Army Presence?