Secret Data Sunday – Gary Langer Edition

Last Sunday I shared an unanswered email I had sent to the Senior Vice President for Editorial Quality at ABC news.  The email gives a self-contained account of the overall context behind my data request, but I’ll take another pass here just to be as clear as possible.

There were a remarkable number of opinion polls conducted in Iraq during the US occupation.  Many of these were fielded by D3 Systems working with KA Research Limited.  Steve Koczela and I analyzed some of these surveys and found extensive evidence of fabricated data.  We wrote up our findings and asked for comments from interested parties.  D3 and Langer Research Associates then threatened to sue us rather than constructively engaging.  (See this, this and this.)

It’s clear that Langer Research Associates reacted so furiously because Gary Langer did a series of D3-KA Iraq polls for ABC  that won an Emmy Award plus the Policy Impact Award from the American Association for Public Opinion Research.  So he has a lot at stake.

Moreover, the write ups of these ABC polls show that the ABC data display some of the same patterns that Steve and I found in other D3-KA-Iraq polls.  One of the big ones is  opinion unanimity in certain governorates, including Anbar, that is more characteristic of robots than it is of human beings.  With this in mind, check out the highlighted text below.

^2284C743C86CC164FCB2B2EF819738398CF6E4E396A18B028B^pimgpsh_fullsize_distr

^E277C881426EB61DB031A34F3791226CA4761A05985A3642E9^pimgpsh_fullsize_distr

Given this background it is, perhaps, not surprising that D3 and Langer went for a legal choke-slam rather than for serious discussion.  Nevertheless, it is disappointing that these research organizations place so little value on the truth.  Thus, there really must be an outside examination of the micro data from ABC’s public opinion polling in Iraq.

I requested the data from Mathew Warshaw of D3 Systems.  He directed me to ABC News.  But, as we know, ABC News ignored my data request.  I also tried Gary Langer who  ignored me at first but finally wrote back on my latest attempt.

This is what I wrote to Langer.

Gary,

This is an opportune moment to renew my data request for the surveys you conducted in Iraq using D3 Systems and KA Research Limited.  You did not reply to my last request.

You abdigate your responsibility to the truth and violate principles of transparency by hiding your data and trying to shut down discussion of your work.

Mike Spagat

This is his reply.

Jeez, you really know how to sweet talk a guy, don’t you?

Extra points for “abdigate.”

OK, I accept full responsibility for misspelling abdicate…..abdicate, abdicate, abdicate, abdigate  gah! dammit….

I’m less apologetic about not being sweeter about my request.  Maybe being sweet is better than not being sweet but, in the end, he should live up to his responsibilities whether or not people talk to him sweetly.

Strangely this isn’t the end of the story but you’ll have to come back next Sunday for more.

Accounting for the Yazidis Killed by ISIS

This article, by Eva Huson, should interest readers of this blog.

In short, we learn that the locations of mass graves of Yazidi victims of ISIS are known.  Yet these graves are not getting excavated because of sovereignty disputes between the Iraqi government, the Kurdish Regional Government [KRG] and the PKK (which is a Kurdish rival to the KRG in the area).

The article also gives us this disturbing paragraph:

The KRG has had trouble conducting exhumations unilaterally. Last year, Human Rights Watch reported that a KRG research team overseen by Mustafa had made an unauthorized exhumation, transferring 65 bodies to a mortuary in the Kurdish city of Dohuk. Mustafa admitted to the human rights organization that the excavation was “not so professional.”

I won’t pretend the issues are easy.  The problem goes beyond the thorny dispute over the establishment of a Kurdish state. This is because the area in question has been outside of  KRG territory until recently.  The war against ISIS drew both the KRG and the PKK into the area.  Now they want to stay and, of course, Baghdad wants them out along with ISIS.   The fighting groups, none of them Yazidis, fear that coming to terms on a mass-grave excavation might compromise their territorial claims.  The article provides no reason to hope for a solution that will help the Yazidis.

So everything is stuck for now and for the forseeable future – no excavations.

The survivors would like to get as much closure as they can get as soon as they can get it so waiting clearly hurts them.  Moreover, waiting creates opportunities for interested parties to tamper with the evidence.  I presume there are other ways that the potential knowlege from an excavation also declines with the passage of time.  However, I am not an expert on this question and would love to hear from a forensic scientist.

 

I want to mention one last thing before signing off.

A few weeks ago the results of a survey of Yazidi survivors was published.  This is a serious effort to quantify the number of Yazidis killed and kidnapped by ISIS.  It also provides estimates on things such as the demographics of the victims and how they were killed.

I will probably have a close look at these estimates in an upcoming post but for now I just want to make sure my readers know about it.

 

Secret Data Sunday – ABC News (in the US) Stonewalls over their Dubious Iraq Public Opinion Polls

Below is an email that I sent to Kerry Smith, the Senior Vice President for Editorial Quality at ABC news, back in November of 2016.

She did not reply..

 

Dear Ms. Smith,

I am a professor of economics specialized in the quantitative analysis of armed conflict.  I have a big body of work focused on data quality issues that arise during data collection in conflict zones, especially survey data.

Back in 2011 I wrote a paper with Steven Koczela, now a prominent pollster with MassINC Polling, that uncovered substantial evidence of fabricated data in polls fielded in Iraq by D3 Systems.  We sent our paper to various interested parties for comments, including Mathew Warshaw of D3 Systems and Gary Langer who had just moved from ABC to found Langer Associates.  We included Mr. Langer in the circulation list because ABC news had used D3 Systems for a series of polls in Iraq that now required urgent re-evaluation.

D3, backed by Langer Associates, responded by threatening to sue me and Mr. Koczela.  See this, this and this.   My university has supported me against this censorship attempt but, unfortunately, Mr. Koczela felt that he could not defend himself and signed an agreement to keep his mouth shut about this particular piece of work.  (This why only my name appears on the first link above.)  Eventually, the legal threat disappeared when I wrote to Mr. Warshaw asking him explain what, specifically, he objected to in our analysis.  He did not reply.

To his credit Mr. Koczela continued working on this issue, unearthing a large number of datasets for opinion polls conducted in Iraq by D3 Systems and other polling companies.  These have provided remarkably strong evidence of data fabrication already.  For example, see this eye-popping analysis.

Many of the D3 Iraq surveys that I now have were conducted for the US State Department.  Mr. Koczela made the State Department aware of the problem at some point and they hired Fritz Scheuren, a former president of the American Statistical Association to investigate.  His analysis confirmed the fabrication problem using an analysis rather different from mine.  Unfortunately, Dr. Scheuren signed a nondisclosure agreement but I believe he would confirm in general terms the main gist of this work and he could also give you an authoritative opinion on my analysis.  (scheuren@aol.com)

Notice that after the Huffington Post article Langer Associates did post a response to my 2011 paper.   This is, however, exceptionally weak as I explain in these articles.  Langer Associates have not addressed the new evidence that has emerged since Mr Koczela’s FOIA either.

I emailed Mr. Langer for the data from the ABC Iraq polls but he did not reply.  I asked Mr. Warshaw for the same data and he referred me to ABC news.  I am now requesting the data from you.

 At the risk of belabouring the obvious, I note that people with strong intellectual cases to make do not start by threatening to sue and finish by withholding their data.

Most importantly, ABC needs to take action to correct the historical record of the Iraq war.  These polling numbers are all over the web sites of ABC news and its partner organizations in these polls.  This work must be retracted.

It is, of course, your journalistic obligation to correct the historical record but, at the same time, I think it’s to your advantage to do so.  Fixing this problem would demonstrate a strong commitment to quality and accuracy.  I doubt you would even lose your Emmy Award.  Surely you won’t be punished for pursuing the truth wherever it leads.  I will do anything I can to help in this regard.

I suggest that we meet to discuss these issues further.  I would be happy to fly to New York at my own expense for this purpose.  Alternatively, we could talk by phone, skype or some other technology.

Sincerely,

 

Professor Michael Spagat

Head of Department

Department of Economics

Royal Holloway College

University of London

Egham, Surrey TW20 0EX

United Kingdom

m.spagat@rhul.ac.uk

+44 1784 414001 (W)

+44 1784 439534 (F)

 

Blog:  https://mikespagat.wordpress.com/

War, Numbers and Human Losses: The Truth Counts

What can you do with the Peru Data?

Somebody asked a fair question in the comments surrounding the release of the Peru dataset: what can you do with it?

That is a very big question that I can’t fully address in a blog post.  Still, I’ll try to offer a few useful thoughts.  Perhaps some readers will jump in with better ideas.  Also, I’d be delighted to hear from anyone who downloads the data and does something interesting with it.

Here’s some background.

First of all, it is event data .  This means that each line in the spread sheet is a discrete occurrence, such as a battle or a massacre.  There are a bunch of pieces of information about each event such as the date, location, number of people killed, violent actors involved, type of event, etc..

The methodology documents posted on the conflict data page give a fair amount of detail on what is in the data and what the criteria are.  It also could be useful to read this data description for the Colombia conflict database (which is also posted on the conflict data page.)  Of course, they are different conflicts and different databases but the methodologies are very similar.

This paper by David Fielding and Anja Shortland used the Peru data to demonstrate escalation cycles (my phrase, not the authors’) in the conflict:

We show that an increase in civilian abuse by one side was strongly associated with subsequent increases in abuse by the other. In this type of war, foreign intervention could substantially reduce the impact on civilians of a sudden rise in conflict intensity, by moderating the resulting ‘cycle of violence’.

I’m afraid that the published version of their paper is behind a paywall but it should be possible to get hold of it if you really want to.

I believe that Fielding and Shortland didn’t use the event character of the data specifically, instead aggregating the events into monthly time series.  However, in this paper we focused entirely on events, focusing on their sizes and timings:

Many collective human activities, including violence, have been shown to exhibit universal patterns1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, 18, 19. The size distributions of casualties both in whole wars from 1816 to 1980 and terrorist attacks have separately been shown to follow approximate power-law distributions6, 7, 9, 10. However, the possibility of universal patterns ranging across wars in the size distribution or timing of within-conflict events has barely been explored. Here we show that the sizes and timing of violent events within different insurgent conflicts exhibit remarkable similarities. We propose a unified model of human insurgency that reproduces these commonalities, and explains conflict-specific variations quantitatively in terms of underlying rules of engagement. Our model treats each insurgent population as an ecology of dynamically evolving, self-organized groups following common decision-making processes. Our model is consistent with several recent hypotheses about modern insurgency18, 19, 20, is robust to many generalizations21, and establishes a quantitative connection between human insurgency, global terrorism10 and ecology13, 14, 15, 16, 17, 22, 23. Its similarity to financial market models24, 25, 26 provides a surprising link between violent and non-violent forms of human behaviour.

The Peru dataset was one of many we used in that article,.which was about patterns in the size distributions and timings of events that appear in war after war, not just the war in Peru.

The reader’s comment also asked about possible projects for undergraduates.  I’m not sure how to answer this question without knowing more about what kinds of undergraduates we’re talking about and what kinds of skills they have.  But students could certainly do various data manipulation exercises such as breaking down the data by region, perpetrator or type of event.

I hope that this post was useful.  I would be happy to respond to further questions.

 

 

Secret Data Sunday – International Rescue Committee Edition

I haven’t posted for a while on this subject so here’s some background.

The International Rescue Committee (IRC) did a series of surveys in the Democratic Republic of Congo (DRC).  The final installment summed up the IRC findings as follows:

Based on the results of the five IRC studies, we now estimate that 5.4 million excess deaths have occurred between August 1998 and April 2007. An estimated 2.1 million of those deaths have occurred since the formal end of war in 2002.

The IRC’s estimate of 5.4 million excess deaths received massive publicity, some of it critical, but journalists and scholars have mostly taken the IRC claim at face value.  The IRC work had substantial methodological flaws that were exposed in detail in the Human Security Report and you should definitely have a look if you haven’t seen this critique. But I won’t rehash all these issues in the present blog post.  Instead, I will just discuss data.

One of the main clouds hanging over the IRC work is the fact that three other surveys find child mortality rates to be steadily falling during the period when the IRC claims there was a massive spike in these rates.  (See this post and this post for more information.)  In particular, there are two DHS surveys and a MICS survey that strongly contradict the IRC claims.

And guess what?

The DHS and MICS data are publicly available but the IRC hides its data.

As always, I don’t draw the conclusion of data hiding lightly but, rather, I’ve tried pretty hard to persuade the relevant actors to come clean.

Frankly, I don’t think I’m under any obligation to make all these efforts.  I haven’t sent any emails to the DHS or MICS people because there’s no need to bother, given that their data are free for the taking.  But the IRC hasn’t posted their data so I resorted to emails.

I wrote multiple times over many months with no success to Ben Coghlan of the Burnet Institute in Australia.  He led the last two rounds of the IRC research, including an academic publication in the Lancet, so he was a sensible starting point.

In the end, it would have been better if Coghlan had just done a Taleb and told me to “fuck off” straight away rather than stringing me along.  First he asked what I wanted to do with the data.  I feel that this is not an appropriate questions since data access shouldn’t really depend plans.  But I told him that I wanted to get to the bottom of why the IRC data were so inconsistent with the other data.  After prompting, he said he needed to delay because he was just finishing his PhD.  I made the obvious reply, pointing out that even while completing a PhD he should still be able to spare ten minutes to send a dataset.  On my next prompt he replied by asking me, rather disingenuously I thought,  how my project was getting on.  I replied that I hadn’t been able to get out of the starting block because he hadn’t sent me any data.  I gave up after two more prompts.

Next I tried Jeannie Annan, the Senior Director of Research and Evaluation at the IRC.  She replied that she didn’t have the data and that I should try …..Ben Coghlan and Les Roberts who led the early rounds of the surveys.

I knew that Les Roberts would never cough up the data (too long a story for this blog post) but wrote him anyway.  He didn’t reply.

I wrote back to Jeannie Annan saying that both Coghlan and Roberts were uncooperative but that, ultimately, this is IRC work and that the IRC needs to take responsibility for it. In my view:

  1. The IRC should have the data if they stand behind their work
  2. If the IRC doesn’t have the data then they should insist that Roberts and Coghlan hand it over.
  3. If Roberts and Coghlan refuse to provide them with the data then the IRC should retract the work.

She didn’t reply.

Here’s where this unfortunate situation stands.

The IRC estimate of 5.4 million excess deaths in the DRC exerts a big influence on the conflict field and on the perceptions of the general public.  It is widely, but erroneously, believed that this DRC conflict has been the deadliest since World War 2.  The IRC estimate survives largely as conventional wisdom, despite the critique of the Human Security Report.

The IRC and the academics involved keep their data well hidden,  choking off further discussion.

PS – Note that this is not only a tale of an NGO that doesn’t uphold scientific standards – there are also academics involved.  I say this because last week at least one person commented that, although Taleb’s behavior is appalling, he’s not really an academic.