A Debate about Excess War Deaths: Part I

I just got page proofs for a new paper on excess deaths in Iraq that I’ve written with Stijn van Weezel. This new paper is actually a rejoinder to a reply to an earlier paper I wrote with Stijn which was, in turn, a critique of a still earlier paper.  In short, Stijn and I have been in an ongoing discussion about excess deaths in Iraq.

So now is a good time to bring my blog readers into the loop on all this new stuff.  Moreover, we are pressed for space in our soon-to-be-published rejoinder so we promise to extend the material onto my blog.  This post is the beginning of the promised extension.

Today I’ll set the table by describing the following sequence of publications.

  1. The starting point is this paper by Hagopian et al. which concludes:

Beyond expected rates, most mortality increases in Iraq can be attributed to direct violence, but about a third are attributable to indirect causes (such as from failures of health, sanitation, transportation, communication, and other systems). Approximately a half million deaths in Iraq could be attributable to the war.

I blogged on this estimate a while back.   Back then my point was simply to show how Hagopian et al. start with a data-based central estimate surrounded by massive uncertainty and then seize on one excuse after another to inflate their central estimate and air brush the uncertainty away.  They wind up with a much higher central estimate than their data can sustain which they then treat as a conservative lower bound.   (The above quote was just a way station along this inflationary journey, delivered in an academic journal that imposed some, but not sufficient, restraint.)

2.  Stijn and I publish a critique of the Hagopian et al. paper.

We focus mostly on the weakness of the case for a large number of non-violent excess deaths in the Iraq war, although we do touch on the inflationary dynamics mentioned above.

Before turning to the main highlights of our critique paper let’s quickly review the concept of excess deaths as it pertains to the Hagopian et al. Iraq estimates.  Their main claim boils down to saying that the during-war death rate in Iraq is higher than the pre-war death rate there.  They then assume that this increase is caused by the war.

There are a few problems with this train of thought.

a. The causal claim commits a known logical error called the “after this, therefore because of this” fallacy.  An example would be arguing that “my alarm clock going off causes the sun to rise.”

That said, the notion that the outbreak of war causes all observed changes in death rates afterward is sufficiently plausible that we shouldn’t just dismiss the idea because logic doesn’t automatically imply it.

b.  The only reason for invoking the excess-deaths concept in the first place is the idea that war violence might lead indirectly to non-violent deaths that wouldn’t have occurred without the war.  To address this possibility we should ask whether the during-war non-violent death rate is higher than pre-war non-violent death rate.  Hagopian et al. confound this comparison of like with like by tossing during-war violent deaths into this mix.  Thus, they compare during-war violent plus non-violent deaths with pre-war non-violent deaths.

Stijn and I perform appropriate comparisons of non-violent death rates.  You can look at the numbers yourself by popping open the paper.  But the general picture is easy enough to understand without looking.  Our central estimates (under various scenarios) for non-violent deaths are always positive but the uncertainty intervals surrounding these estimates are extremely wide and dip far below zero.  Thus, evidence that there are very many, if any, non-violent excess deaths is extremely weak despite the grandiose claims of Hagopian et al..

In our determination to uncover any possible evidence of excess non-violent deaths we also perform a “differences-in-differences” analysis.  The idea here is that if violence leads indirectly to non-violent deaths then we’d expect non-violent death rates to jump up more in relatively violent zones than they do in relatively peaceful zones.  In other words, if violence leads indirectly to non-violent deaths in Iraq then there should be a positive spatial correlation between violence and increases in non-violent death rates.  We find no such thing.

There is more in the paper and I would be delighted to respond to questions about it.  But, for now, I’ll move on.

3.  Next, Hagopian et al. respond.

I assume that, soon enough, you’ll be able to see their response together with our rejoinder side by side in the journal so I won’t go into detail here.  Still, I want to note two things.

First, the Hagopian et al. reply does not address our main point about the separation of violent deaths from non-violent deaths which is described in section 2 above.

Second, Hagopian et al. spill considerable ink on ad hominem attacks.  The main one takes of form of saying that I have worked with Iraq Body Count (IBC) and the IBC dataset is bad –  therefore nobody should trust anything I say.  Stijn and I don’t actually mention IBC in our critique paper so IBC data quality is entirely irrelevant to our argument.  Indeed, Hagopian et al. don’t even try to link IBC data quality with any of our substantive arguments.  Yet, I fear that much of the mud they sling at IBC will stick so I’ll try to clean some of it off in the follow-up blog posts.

4.  Finally, there is our rejoinder.

Again, I don’t want to attempt too much prior to publication.  However, as already mentioned above, I will do a few further blog posts on material that we couldn’t cover within the space we had.  These will be mainly, possibly exclusively, about the IBC database which Hagopian et al. attack very unreasonably in their reply.

OK, I’ve set the table.  More later.

 

 

 

Advertisements

Open the Door to all the Hidden Election Polling Data

The UK House of Lords has issued a call for evidence on the effects of political polling and digital digital media on politics.  Submissions are due next week so maybe someone out there wants to dash something off….or maybe someone would be so kind as to give me feedback on my proposal.  Below I give a draft.

Comments welcome!

Note that everything I say applies equally to political polling in the US and around the globe but, quite reasonably, the Lords ask about British polling so the proposal is written about British polling.

(OK, the proposal is about election polling, not war.  But this post is very much in keeping with the open data theme of the blog so I believe it will be of general interest to my readers.)

pexels-photo-147634

 

The Proposal[1]

I have one specific suggestion that could, if implemented, substantially improve political life in the UK; require collectors of political polling data to release their detailed micro datasets into the public domain.

A Preliminary Clarification

Some readers may think, wrongly, that pollsters generally do provide detailed micro datasets already.  Occasionally they do.  But normally they just publish summary tables, while withholding the interview-by-interview results (anonymized, of course).  Researchers need such detailed data to make valid estimates.

The Argument

Let me develop this idea in steps.

Political pollsters face two main challenges.  First, they cannot draw well-behaved random samples of voters for their polls.  This is mainly because most people selected for interviews refuse to participate.  Moreover, the political views of the refusers differ systematically from those of the participants.  Second, it is difficult to predict which poll participants will turn out to vote.  Yet good election prediction relies on good turnout prediction.

These two challenges dictate that political polling datasets cannot simply interpret themselves.  Rather, pollsters must use their knowledge, experience, intuition, wisdom and other wiles to model their way out of the shortcomings of their data.  There now exists a growing array of techniques that can be deployed to address political polling challenges.  But good applications of these techniques embody substantial elements of professional judgment, about which experts disagree.

This New York Times article leaves little doubt about the point of the last paragraph.  The NYT gave the detailed micro data from one of their proprietary Trump-Clinton polls to four analytical teams and asked for projections.  The results ranged from Clinton +4 to Trump +1.[2]  These are all valid estimates made by serious professionals.  Yet they differ quite substantively because the teams differ in some of their key judgments.

The key point is that for the foreseeable future there will not be one correct analytical technique that, if applied properly, will always lead to a correct treatment of new polling data.  Rather, there will be a useful range of valid analyses that can be made from any political polling dataset.

Presently we are robbed of all but one analysis of most political polling datasets that are collected in the UK.  This is because polling data are held privately and never released into the public domain.  This data black hole wastes opportunities in two distinct directions.  First, we cannot learn as much as possible about the state of public opinion during elections.  Second, by limiting the range of experimentation that is applied to each dataset we retard the development process for improving our analytical techniques.

An Important Caveat

Much political polling data are collected by private companies that must make a profit on their investment.  These organizations might feel threatened by this open data proposal.  However, these concerns can easily be addressed by allowing an appropriate interval of time for data collectors to monopolize their datasets.  This could work much in the way that patents are issued to provide creative incentives for inventors by giving inventors a window of time to reap high rewards before their inventions can be copied by competitors.  The only difference here is that these monopolization intervals for pollsters should be much shorter than they are for patent intervals, probably only two weeks or so.

Parliament Should Defend the Public Interest

There is a strong public interest in making full use of political polling data.  Yet even public organizations like the BBC collect political polling data (although not in the 2017 election), write up general summaries and then consign their detailed micro data into oblivion.  If public organizations cannot be convinced to do a better job of serving the public interest then they should be forced to do so.  Even private companies should be forced, by legislation if necessary, to place their political polling data into the public domain after they have been allowed a decent interval designed to feed their bottom lines.

I do not argue that all private survey data should be released to the public.  There must be a public interest test that has to be satisfied before public release can be mandated.  This test would not be satisfied for most privately collected survey data.  But election polling does meet this public interest standard and should be claimed as a public resource to benefit everyone in the UK.

 

[1] I urge the committee to consult with the leadership of the Royal Statistical Society on this question.  I have not coordinated my submission with them but I believe that they would back it.

[2] The official NYT estimate was Clinton +1.

Fabrication in Survey Data: A Sustainable Ecosystem

Here is a presentation I gave a few weeks ago on fabrication in survey data.

It includes some staple material from the blog but, mainly, I set off in a new direction – trying to explain why survey data get fabricated in the first place.

While writing the presentation I realized that these conditions are similar to those that led to the Grenfell Tower fire.  I only hint at these connections in the presentation but I plan to pursue this angle in the future.