The Bloody Lens – Thursday Evening Event in London!

I’m participating in an event in London this Thursday evening.  Please come!  It should be really interesting and there are still a few (free) tickets left.

Below is the event description.

You can sign up here.

(Note there is a room change.  It’s now in room 104 of Senate house rather than the Court Room.  There will be signs up.)


Bloody Lens



Spewing Rancid Effluvia at Iraq Body Count – Part 1

This post follows up on this one. However, rather than calling it “A Debate about Excess Deaths – Part 2” I went with the above title which is  more descriptive of what’s actually going on here..

In fact, it’s bizarre that the Iraq Body Count (IBC)  database has been  dragooned into a debate about excess deaths.  IBC exclusively records violent deaths.  The concept of excess deaths, on the other hand, was created to account for the possibility that war violence can lead, indirectly, to non-violent deaths.  So the IBC database is not going to be particularly relevant to a debate about excess deaths.

To understand why we’re here you have to recall the following sequence of events.

  1. Hagopian et al. publish a paper claiming 1/2 a million excess deaths in Iraq.
  2.  Stijn van Weezel and I publish a critique saying that this number is greatly exaggerated.
  3.  Hagopian et al. publish a comeback claiming they are right and we are wrong.  (NEWS FLASH – their critique is actually published.  I wasn’t aware of this when I wrote my previous blog post.)
  4.  Stijn and I will publish a rejoinder.  (We’ve already signed off on page proofs but the paper isn’t out yet.)

I will blog our rejoinder (event 4) when it appears.  Now I just want to address some points that, due to space constraints, Stijn and I were forced to ommit from our paper.

One of the main arguments Hagopian et al. use to defend their excess death estimate is the very model of a modern ad hominem attack.  I am a co-author on the critique paper but I am discredited because I have worked with IBC which itself is discredited (the claim) – therefore, the excess death statistics of Hagopian et al. are correct.  With a bit more research Hagopian et al. might have bolstered their logic by pointing out that I support Crystal Palace in  Premiership Football but the Pride of South London is now teetering on the brink of relegation – thus, they are right and I am wrong about Iraq.

For the excess deaths debate the above paragraph should be enough.  However, Hagopian et al. sling so much rancid effluvia at IBC that I feel I have to correct the record.

This post is a start.

Hagopian et al. write:

Spagat has published extensively using the data of Iraq Body Count, a passive media-based measure of 2003 Iraq war mortality…This method has been discredited, however, as it understates mortality (Ahmed, 2015; Burkle & Garfield, 2013; Carpenter et al. 2013; Siegler et al., 2008)  As evidence, an important finding in our work is that small arms fire contributed substantially to mortality (63%); these events rarely make the sort of headlines tracked by the Iraq Body Count.

It’s hard to find any true statement or respectable citation in the above excerpt.  But you have to start somewhere so I’m going to go with the very end.

Notice, first of all, the weasely wording – there are five co-authors but none of them have bothered to learn what the percentage of deaths attributed to gunfire in the IBC database actually is.  They just venture, incorrectly, that IBC only tracks headlines, and that gunfire events rarely make it into these.

How do we quantify “rare”?  Maybe 10%?  That seems way too high for “rare”.  Maybe 1% or 0.1%?  I’m not sure.

In reality, IBC assigns gunfire to 54% of the deaths in its database during the period covered by the Hagopian et al. survey (March 2003 through June 2011).  And this number understates the full IBC percentage because IBC has a separate category of “executions” which are overwhelmingly gun deaths, although I can’t quickly separate gun executions from non-gun executions.

On top of that the Hagopian et al. survey (known as the UCIMS) has two separate modules; one is household based and the other is sibling based.  (In the former people are asked about deaths within their households and in the latter people are asked about deaths of siblings.)  These two modules lead to separate estimates based on different techniques.  And what is the sibling-based UCIMS estimate for the percentage of gunfire deaths?  Errr….54%, same as IBC.

So Hagopian et al. serve up the gunfire percentage as a prime defect of the IBC database when, in fact, IBC and the UCIMS are very much compatible on this metric.  Indeed, the preponderance of gun deaths has been a prime talking point for IBC since shortly after the invasion phase of the war (when air strikes predominated). So the Hapopian et al. insight is an old one.

You’d think that Hagopian et al. would be pleased by confirmation from IBC and would be happy to cite this agreement.  Instead, sadly, they manufacture a falsehood about IBC – that it rarely records gun deaths when the truth is that most deaths  in the IBC database are gun deaths.  They then swipe at IBC from atop their fictitious creation..

And this point about gun deaths is just a tiny drop in the sea of slime Hagopian et al. sling at IBC.  I’ll return soon for more cleansing.

A Debate about Excess War Deaths: Part I

I just got page proofs for a new paper on excess deaths in Iraq that I’ve written with Stijn van Weezel. This new paper is actually a rejoinder to a reply to an earlier paper I wrote with Stijn which was, in turn, a critique of a still earlier paper.  In short, Stijn and I have been in an ongoing discussion about excess deaths in Iraq.

So now is a good time to bring my blog readers into the loop on all this new stuff.  Moreover, we are pressed for space in our soon-to-be-published rejoinder so we promise to extend the material onto my blog.  This post is the beginning of the promised extension.

Today I’ll set the table by describing the following sequence of publications.

  1. The starting point is this paper by Hagopian et al. which concludes:

Beyond expected rates, most mortality increases in Iraq can be attributed to direct violence, but about a third are attributable to indirect causes (such as from failures of health, sanitation, transportation, communication, and other systems). Approximately a half million deaths in Iraq could be attributable to the war.

I blogged on this estimate a while back.   Back then my point was simply to show how Hagopian et al. start with a data-based central estimate surrounded by massive uncertainty and then seize on one excuse after another to inflate their central estimate and air brush the uncertainty away.  They wind up with a much higher central estimate than their data can sustain which they then treat as a conservative lower bound.   (The above quote was just a way station along this inflationary journey, delivered in an academic journal that imposed some, but not sufficient, restraint.)

2.  Stijn and I publish a critique of the Hagopian et al. paper.

We focus mostly on the weakness of the case for a large number of non-violent excess deaths in the Iraq war, although we do touch on the inflationary dynamics mentioned above.

Before turning to the main highlights of our critique paper let’s quickly review the concept of excess deaths as it pertains to the Hagopian et al. Iraq estimates.  Their main claim boils down to saying that the during-war death rate in Iraq is higher than the pre-war death rate there.  They then assume that this increase is caused by the war.

There are a few problems with this train of thought.

a. The causal claim commits a known logical error called the “after this, therefore because of this” fallacy.  An example would be arguing that “my alarm clock going off causes the sun to rise.”

That said, the notion that the outbreak of war causes all observed changes in death rates afterward is sufficiently plausible that we shouldn’t just dismiss the idea because logic doesn’t automatically imply it.

b.  The only reason for invoking the excess-deaths concept in the first place is the idea that war violence might lead indirectly to non-violent deaths that wouldn’t have occurred without the war.  To address this possibility we should ask whether the during-war non-violent death rate is higher than pre-war non-violent death rate.  Hagopian et al. confound this comparison of like with like by tossing during-war violent deaths into this mix.  Thus, they compare during-war violent plus non-violent deaths with pre-war non-violent deaths.

Stijn and I perform appropriate comparisons of non-violent death rates.  You can look at the numbers yourself by popping open the paper.  But the general picture is easy enough to understand without looking.  Our central estimates (under various scenarios) for non-violent deaths are always positive but the uncertainty intervals surrounding these estimates are extremely wide and dip far below zero.  Thus, evidence that there are very many, if any, non-violent excess deaths is extremely weak despite the grandiose claims of Hagopian et al..

In our determination to uncover any possible evidence of excess non-violent deaths we also perform a “differences-in-differences” analysis.  The idea here is that if violence leads indirectly to non-violent deaths then we’d expect non-violent death rates to jump up more in relatively violent zones than they do in relatively peaceful zones.  In other words, if violence leads indirectly to non-violent deaths in Iraq then there should be a positive spatial correlation between violence and increases in non-violent death rates.  We find no such thing.

There is more in the paper and I would be delighted to respond to questions about it.  But, for now, I’ll move on.

3.  Next, Hagopian et al. respond.

I assume that, soon enough, you’ll be able to see their response together with our rejoinder side by side in the journal so I won’t go into detail here.  Still, I want to note two things.

First, the Hagopian et al. reply does not address our main point about the separation of violent deaths from non-violent deaths which is described in section 2 above.

Second, Hagopian et al. spill considerable ink on ad hominem attacks.  The main one takes of form of saying that I have worked with Iraq Body Count (IBC) and the IBC dataset is bad –  therefore nobody should trust anything I say.  Stijn and I don’t actually mention IBC in our critique paper so IBC data quality is entirely irrelevant to our argument.  Indeed, Hagopian et al. don’t even try to link IBC data quality with any of our substantive arguments.  Yet, I fear that much of the mud they sling at IBC will stick so I’ll try to clean some of it off in the follow-up blog posts.

4.  Finally, there is our rejoinder.

Again, I don’t want to attempt too much prior to publication.  However, as already mentioned above, I will do a few further blog posts on material that we couldn’t cover within the space we had.  These will be mainly, possibly exclusively, about the IBC database which Hagopian et al. attack very unreasonably in their reply.

OK, I’ve set the table.  More later.




Open the Door to all the Hidden Election Polling Data

The UK House of Lords has issued a call for evidence on the effects of political polling and digital digital media on politics.  Submissions are due next week so maybe someone out there wants to dash something off….or maybe someone would be so kind as to give me feedback on my proposal.  Below I give a draft.

Comments welcome!

Note that everything I say applies equally to political polling in the US and around the globe but, quite reasonably, the Lords ask about British polling so the proposal is written about British polling.

(OK, the proposal is about election polling, not war.  But this post is very much in keeping with the open data theme of the blog so I believe it will be of general interest to my readers.)



The Proposal[1]

I have one specific suggestion that could, if implemented, substantially improve political life in the UK; require collectors of political polling data to release their detailed micro datasets into the public domain.

A Preliminary Clarification

Some readers may think, wrongly, that pollsters generally do provide detailed micro datasets already.  Occasionally they do.  But normally they just publish summary tables, while withholding the interview-by-interview results (anonymized, of course).  Researchers need such detailed data to make valid estimates.

The Argument

Let me develop this idea in steps.

Political pollsters face two main challenges.  First, they cannot draw well-behaved random samples of voters for their polls.  This is mainly because most people selected for interviews refuse to participate.  Moreover, the political views of the refusers differ systematically from those of the participants.  Second, it is difficult to predict which poll participants will turn out to vote.  Yet good election prediction relies on good turnout prediction.

These two challenges dictate that political polling datasets cannot simply interpret themselves.  Rather, pollsters must use their knowledge, experience, intuition, wisdom and other wiles to model their way out of the shortcomings of their data.  There now exists a growing array of techniques that can be deployed to address political polling challenges.  But good applications of these techniques embody substantial elements of professional judgment, about which experts disagree.

This New York Times article leaves little doubt about the point of the last paragraph.  The NYT gave the detailed micro data from one of their proprietary Trump-Clinton polls to four analytical teams and asked for projections.  The results ranged from Clinton +4 to Trump +1.[2]  These are all valid estimates made by serious professionals.  Yet they differ quite substantively because the teams differ in some of their key judgments.

The key point is that for the foreseeable future there will not be one correct analytical technique that, if applied properly, will always lead to a correct treatment of new polling data.  Rather, there will be a useful range of valid analyses that can be made from any political polling dataset.

Presently we are robbed of all but one analysis of most political polling datasets that are collected in the UK.  This is because polling data are held privately and never released into the public domain.  This data black hole wastes opportunities in two distinct directions.  First, we cannot learn as much as possible about the state of public opinion during elections.  Second, by limiting the range of experimentation that is applied to each dataset we retard the development process for improving our analytical techniques.

An Important Caveat

Much political polling data are collected by private companies that must make a profit on their investment.  These organizations might feel threatened by this open data proposal.  However, these concerns can easily be addressed by allowing an appropriate interval of time for data collectors to monopolize their datasets.  This could work much in the way that patents are issued to provide creative incentives for inventors by giving inventors a window of time to reap high rewards before their inventions can be copied by competitors.  The only difference here is that these monopolization intervals for pollsters should be much shorter than they are for patent intervals, probably only two weeks or so.

Parliament Should Defend the Public Interest

There is a strong public interest in making full use of political polling data.  Yet even public organizations like the BBC collect political polling data (although not in the 2017 election), write up general summaries and then consign their detailed micro data into oblivion.  If public organizations cannot be convinced to do a better job of serving the public interest then they should be forced to do so.  Even private companies should be forced, by legislation if necessary, to place their political polling data into the public domain after they have been allowed a decent interval designed to feed their bottom lines.

I do not argue that all private survey data should be released to the public.  There must be a public interest test that has to be satisfied before public release can be mandated.  This test would not be satisfied for most privately collected survey data.  But election polling does meet this public interest standard and should be claimed as a public resource to benefit everyone in the UK.


[1] I urge the committee to consult with the leadership of the Royal Statistical Society on this question.  I have not coordinated my submission with them but I believe that they would back it.

[2] The official NYT estimate was Clinton +1.

Fabrication in Survey Data: A Sustainable Ecosystem

Here is a presentation I gave a few weeks ago on fabrication in survey data.

It includes some staple material from the blog but, mainly, I set off in a new direction – trying to explain why survey data get fabricated in the first place.

While writing the presentation I realized that these conditions are similar to those that led to the Grenfell Tower fire.  I only hint at these connections in the presentation but I plan to pursue this angle in the future.