A Debate about Excess War Deaths: Part I

I just got page proofs for a new paper on excess deaths in Iraq that I’ve written with Stijn van Weezel. This new paper is actually a rejoinder to a reply to an earlier paper I wrote with Stijn which was, in turn, a critique of a still earlier paper.  In short, Stijn and I have been in an ongoing discussion about excess deaths in Iraq.

So now is a good time to bring my blog readers into the loop on all this new stuff.  Moreover, we are pressed for space in our soon-to-be-published rejoinder so we promise to extend the material onto my blog.  This post is the beginning of the promised extension.

Today I’ll set the table by describing the following sequence of publications.

  1. The starting point is this paper by Hagopian et al. which concludes:

Beyond expected rates, most mortality increases in Iraq can be attributed to direct violence, but about a third are attributable to indirect causes (such as from failures of health, sanitation, transportation, communication, and other systems). Approximately a half million deaths in Iraq could be attributable to the war.

I blogged on this estimate a while back.   Back then my point was simply to show how Hagopian et al. start with a data-based central estimate surrounded by massive uncertainty and then seize on one excuse after another to inflate their central estimate and air brush the uncertainty away.  They wind up with a much higher central estimate than their data can sustain which they then treat as a conservative lower bound.   (The above quote was just a way station along this inflationary journey, delivered in an academic journal that imposed some, but not sufficient, restraint.)

2.  Stijn and I publish a critique of the Hagopian et al. paper.

We focus mostly on the weakness of the case for a large number of non-violent excess deaths in the Iraq war, although we do touch on the inflationary dynamics mentioned above.

Before turning to the main highlights of our critique paper let’s quickly review the concept of excess deaths as it pertains to the Hagopian et al. Iraq estimates.  Their main claim boils down to saying that the during-war death rate in Iraq is higher than the pre-war death rate there.  They then assume that this increase is caused by the war.

There are a few problems with this train of thought.

a. The causal claim commits a known logical error called the “after this, therefore because of this” fallacy.  An example would be arguing that “my alarm clock going off causes the sun to rise.”

That said, the notion that the outbreak of war causes all observed changes in death rates afterward is sufficiently plausible that we shouldn’t just dismiss the idea because logic doesn’t automatically imply it.

b.  The only reason for invoking the excess-deaths concept in the first place is the idea that war violence might lead indirectly to non-violent deaths that wouldn’t have occurred without the war.  To address this possibility we should ask whether the during-war non-violent death rate is higher than pre-war non-violent death rate.  Hagopian et al. confound this comparison of like with like by tossing during-war violent deaths into this mix.  Thus, they compare during-war violent plus non-violent deaths with pre-war non-violent deaths.

Stijn and I perform appropriate comparisons of non-violent death rates.  You can look at the numbers yourself by popping open the paper.  But the general picture is easy enough to understand without looking.  Our central estimates (under various scenarios) for non-violent deaths are always positive but the uncertainty intervals surrounding these estimates are extremely wide and dip far below zero.  Thus, evidence that there are very many, if any, non-violent excess deaths is extremely weak despite the grandiose claims of Hagopian et al..

In our determination to uncover any possible evidence of excess non-violent deaths we also perform a “differences-in-differences” analysis.  The idea here is that if violence leads indirectly to non-violent deaths then we’d expect non-violent death rates to jump up more in relatively violent zones than they do in relatively peaceful zones.  In other words, if violence leads indirectly to non-violent deaths in Iraq then there should be a positive spatial correlation between violence and increases in non-violent death rates.  We find no such thing.

There is more in the paper and I would be delighted to respond to questions about it.  But, for now, I’ll move on.

3.  Next, Hagopian et al. respond.

I assume that, soon enough, you’ll be able to see their response together with our rejoinder side by side in the journal so I won’t go into detail here.  Still, I want to note two things.

First, the Hagopian et al. reply does not address our main point about the separation of violent deaths from non-violent deaths which is described in section 2 above.

Second, Hagopian et al. spill considerable ink on ad hominem attacks.  The main one takes of form of saying that I have worked with Iraq Body Count (IBC) and the IBC dataset is bad –  therefore nobody should trust anything I say.  Stijn and I don’t actually mention IBC in our critique paper so IBC data quality is entirely irrelevant to our argument.  Indeed, Hagopian et al. don’t even try to link IBC data quality with any of our substantive arguments.  Yet, I fear that much of the mud they sling at IBC will stick so I’ll try to clean some of it off in the follow-up blog posts.

4.  Finally, there is our rejoinder.

Again, I don’t want to attempt too much prior to publication.  However, as already mentioned above, I will do a few further blog posts on material that we couldn’t cover within the space we had.  These will be mainly, possibly exclusively, about the IBC database which Hagopian et al. attack very unreasonably in their reply.

OK, I’ve set the table.  More later.

 

 

 

Advertisements

Secret Data Sunday – Iraq Family Health Survey

The WHO-sponsored Iraq Family Health Survey (IFHS) led to a nice publication in the New England Journal of Medicine that came complete with an editorial puff piece extolling its virtues.  According to the NEJM website this publication has generated 60 citations and we’re still counting.   If you cast a net wider than just medical publications then the  citation count must run well into the hundreds.

But the IFHS virtues don’t stop there.  The NEJM paper, and the accompanying report, are well written and supply plenty of good methodological information about the survey.  The authors are pretty up front about the limitations of their work, notably that they had to skip interviews in some areas due to security concerns.  Moreover, the IFHS is an important survey not least because its estimate of 150,000 violent deaths discredited the Burnham et al. estimate of 600,000 violent deaths for almost exactly the same time period.  (The Burnham et al. survey hid its methodology and was afflicted by serious ethical and data integrity problems. )

I have cited the IFHS multiple times in my own work and generally believe in it.  At the same time, the IFHS people did several questionable things with their analysis that I would like to correct, or at least investigate, by reanalyzing the IFHS data.

But here’s the rub.  The WHO has not released the IFHS dataset.

I and other people have requested it many times.  The field work was conducted way back in 2006.  So what is the WHO waiting on?

I’ll leave a description of my unrealized reanalysis to a future post. This is because my plans just don’t matter for the issue at hand; the IFHS data should be in the public domain whether or not I have a good plan for analyzing them.  (See this post on how the International Rescue Committee hides its DRC data in which I make the same point.)

There is an interesting link between the IFHS and the Iraq Child and Maternal Mortality Survey, another important dataset that is also unavailable.  The main point of contact for both surveys is Mohamed Ali of the WHO.  Regarding the IFHS. Mohamed seemed to tell me in an email that only the Iraqi government is empowered to release the dataset.  If so, this suggests a new (at least for me) and disturbing problem;

Apparently, the WHO uses public money to sponsor surveys but then sells out the general public by ceding their data distribution rights to local governments, in this case to Iraq.  

This is practice of allowing governments benefiting from UN-sponsored research to withhold data from the public that pays for the research is unacceptable .  It’s great that the WHO sponsors survey research in needy countries but open data should be a precondition for this service.

 

 

How Many People were Killed in the Libyan Conflict – Some field work that raises more questions than it answers

Hana Salama asked me for an opinion on this article. I had missed it but it is, potentially, interesting to me so I am happy to oblige her.

I’ve now absorbed it but find myself even more puzzled than I was after reading that Syria survey I blogged on a few weeks back.  Again, it looks like some people did some useful field work but the write up is so bad that it’s hard to know exactly what they did.  In fact, the Libya work is more opaque than the Syria work to the point where I wonder what, if anything, was actually done.

For orientation here is the core of the abstract:

Methods

A systematic cross-sectional field survey and non-structured search was carried out over fourteen provinces in six Libyan regions, representing the primary sites of the armed conflict between February 2011 and February 2012. Thirty-five percent of the total area of Libya and 62.4% of the Libyan population were involved in the study. The mortality and injury rates were determined and the number of displaced people was calculated during the conflict period.

Results

A total of 21,490 (0.5%) persons were killed, 19,700 (0.47%) injured and 435,000 (10.33%) displaced. The overall mortality rate was found to be 5.1 per 1000 per year (95% CI 4.1–7.4) and injury rate was found to be 4.7 per 1000 per year (95% CI 3.9–7.2) but varied by both region and time, reaching peak rates by July–August 2011.

I’m not sure but I think the researchers (hereafter Daw et. al.) tried to count war deaths (plus injuries and displacement numbers) rather than trying to statistically estimate these numbers.  (See this paper on the distinction.)

Actually, I read the whole paper thinking that Daw et al. drew a random sample and did statistical estimation but then I changed my mind.  I got my initial impression at the beginning because they say

This epidemiological community-based study was guided by previously published studies and guidelines.

They then cite the (horrible) Roberts et al. (2004) Iraq survey as providing a framework for their research (see this and follow the links).   Since Roberts et al. was a sample survey I figured that Daw et al. was also a sample survey.  They then go on to say that

Face to face interviews were carried out with at least one member of each affected family….

This also seemed to point in the direction of a sample survey conducted on a bunch of randomly selected households.  (With this method you pick a bunch of households at random, find out how many people lived and died in each one and then extrapolate a national death rate from the in-sample death data.)

But then I realized that the above quote continues with

…listed in the registry of the Ministry of Housing and Planning

Hmmmm….so they interviewed all affected families listed in the registry of some Ministry.  This registry cannot have been a registry of every family living in the areas covered by the survey because there are far more families there than could have been interviewed on this project.  (The areas covered contain around 4.2 million people according to Table 1 of the paper and  surely Daw et al. did not conduct hundreds of thousands of interviews.)

So I’m guessing that the interviews were just of people from families on an official list of victims; killed, injured or displaced.  This guess places a lot of emphasis on one interpretation of the words “listed” and “affected” but it does make some sense.

To be clear, even interviewing one representative from every affected family would have been a gargantuan task since Daw et al. identify around 40,000 casualties (killings plus injuries) and more than 400,000 displaced people.  So we would still be talking about tens of thousands of interviews.

To be honest, now I’m wondering if all these interviews really happened.  That’s an awful lot of interviews and they would have been conducted in the middle of a war.

So now I’m back to thinking that maybe it was a sample survey of a few thousand households.  But if so then the write up has the large flaw that there is no description whatsoever of how its sample was drawn (if, indeed, there was a sample).

Something is definitely wrong here.  I shouldn’t have to get out a Ouiji board to divine the authors’ methodology.

The Syria survey discussed a few weeks ago seems to be in a different category.  For that one I have a lot of questions about what they did combined with doubts about whether their methods make sense.  But this Libya write-up seems weird to the point where I wonder whether they were actually out in the field at all.

Maybe an email to Dr. Daw will clear things up in a positive way.  With the Syria paper emailing the lead author got me nowhere but maybe here it will work.  I’m afraid that the best case scenario is that Daw et al. did some useful field work that was obscured by a poor write up and that there is a better paper waiting to get written.

 

 

 

The AAPOR Report on 2016 US Election Polling plus some Observations on Survey Measurement of War Deaths – Part 1

I’ve finally absorbed the report of the American Association for Public Opinion Research (AAPOR) on polling in the Trump-Clinton election.  So I’ll jot down my reactions in a series of posts  (see also this earlier post).   In keeping with the spirit of the blog I’ll also offer related thoughts on survey-based approaches to estimating numbers of war deaths.

I strongly recommend the AAPOR report.  It has many good insights and is highly readable.

That said, I’ll mostly criticize it here.

But before I proceed to the substance of the AAPOR report I want to draw your attention to the complete absence of an analogous document in the literature using household surveys to estimate war deaths.

There has been at least one notable success in survey-based war-death estimation and several notable failures.  (two of the biggest are here and here).  Yet there has not been any soul searching within the community of practitioners in the conflict field that can be even remotely compared to the AAPOR document.  On the contrary, there is a sad history of epidemiologists militantly promoting discredited work as best practice.  See, for example, this paper which concludes:

The use of established epidemiological methods is rare. This review illustrates the pressing need to promote sound epidemiologic approaches to determining mortality estimates and to establish guidelines for policy-makers, the media and the public on how to interpret these estimates.

The great triumph that drives the above conclusion is the notorious Burnham et al. (2006) study which overestimated the number of violent deaths in Iraq by at least a factor of 4 while endangering the lives of its interviewees.

Turning back to the AAPOR document, I want to underscore that AAPOR, to its credit, has produced a self-critical report and I’m benefiting here from the nice platform their committee has provided.

The report maintains a strong distinction between national polls and state polls.  Rather unfortunately though, the report sets up state pollsters as the poor cousins of the real national pollsters.

It is a persistent frustration within polling and the larger survey research community that the profession is judged based on how these often under-budgeted state polls perform relative to the election outcome.

Analogously, we might say that Democrats are frustrated by the judgments of the electoral college which keeps handing the presidency over to Republicans despite Democrat victories in popular votes.  Yes, I too am frustrated by this weird tick of the American system.  But the electoral college is the way the US determines its presidency and we all have to accept this.   And just as it would be a mistake for Democrats to focus on winning the popular vote while downplaying the electoral college, it’s also a mistake for pollsters to focus on predicting the popular vote while leaving electoral college prediction as an afterthought.

The above quote is followed by something that is also pretty interesting:

The industry cannot realistically change how it is judged, but it can make an improvement to the polling landscape, at least in theory. AAPOR does not have the resources to finance a series of high quality state-level polls in presidential
elections, but it might consider attempting to organize financing for such an effort. Errors in state polls like those observed in 2016 are not uncommon. With shrinking budgets at news outlets to finance polling, there is no reason to believe that this problem is going to fix itself. Collectively, well-resourced survey organizations might have enough common interest in financing some high quality state-level polls so as to reduce the likelihood of another black eye for the profession.

I have to think more about this but at first glance this thinking seems sort of like saying:

Look, for a while we’ve been down here in Ecuador selling space heaters and, realistically, that’s not gonna change (although we’re writing this report because our business is faltering).  But maybe next year space heater companies can donate  a few air conditioners to some needy people.  It’s naive to imagine that there will be any money in the air conditioner business in Ecuador but this charity might help us defend ourselves against the frustrating criticism that air conditioner companies are supplying a crappy product.

In other words, it’s clear that a key missing ingredient for better election prediction is more high-quality state polls.  So why is it obvious that the market will not reward more good state polls but it will reward less relevant national ones?

(Side note – I think there are high-quality state polls and I think that the AAPOR committee agrees with me on this.  It’s just that there aren’t enough good state polls and also the average quality level may be lower on state polls than it is on national ones.)

Maybe I’m missing something here.  Is there some good reason why news consumers will always want more national polls even though these are less informative than state polls are?

Maybe.

But maybe journalists should just do better job of educating their audiences.  A media company could stress that presidential elections are decided state by state, not at the national level, and so this election season they will do their polling state by state, thereby providing a better product than that of their competitors who are only doing national polls.

In short, there should be a way to sell high quality information and I hope that the polling industry innovates to tailor their products more closely to market needs than they have done in recent years.

 

I’ve Done Something or Other and Say that 470,000 People were Killed in Syria – Would you Like to Interview Me?

Let’s go back to February of 2016 when the New York Times ran this headline:

Death Toll from War in Syria now 470,000, Group Finds

The headline is more conservative than a caption in the same article which reads:

At least [my emphasis] 470,000 Syrians have died as a result of the war, according to the Syrian Center for Policy Research.

This switch between the headline and the caption is consistent with a common pattern of converting an estimate, that might be either too high or too low, into a bare minimum.

Other respected outlets such as PBS, and Time jumped onto the 470,000 bandwagon with the Guardian claiming primacy in this story with an early exclusive that quotes the report’s author:

“We use very rigorous research methods and we are sure of this figure,” Rabie Nasser, the report’s author, told the Guardian. “Indirect deaths will be greater in the future, though most NGOs [non-governmental organisations] and the UN ignore them.

“We think that the UN documentation and informal estimation underestimated the casualties due to lack of access to information during the crisis,” he said.

Oddly, none of the news articles say anything about what this rigorous methodology is.  The Guardian refers to “counting” which I would normally interpret as saying that the Syrian Center for Policy Research (SCPR) has a list of 470,000 people killed but it is not at all clear that they really have such a list.

This report was the source for all the media attention.  The figure of 470,000 appears just once in the report, in a throwaway line in the conclusion:

 The armed conflict badly harmed human development in Syria where the fatalities in 2015 reached about 470,000 deaths, the life expectancy at birth estimated at 55.4 years, and the school age non-attendance rate projected at 45.2 per cent; consequently, the HDI of Syria is estimated to have lost 29.8 per cent of its HDI value in 2015 compared to 2010.

The only bit of the report that so much as hints at where the 470,00 number came from is this:

The report used results and methodology from a forthcoming SCPR report on the human development in Syria that is based on a comprehensive survey conducted in the mid of 2014 and covered all regions in Syria. The survey divided Syria into 698 studied regions and questionnaire three key informants, with specific criteria that guarantee inclusiveness and transparency, from each region. Moreover, the survey applied a strict system of monitoring and reviewing to ensure the correctness of responses. About 300 researchers, experts, and programmers participated in this survey.

This is nothing.

The hunger for scraps of information on the number of people killed in Syria is, apparently, so great that it is feasible to launch a bunch of news headlines just by saying you’ve looked into this question and come up with a number that is larger than what was previously thought.  (I strongly suspect that having a bigger number which you use to dump on any smaller numbers is a key part of getting noticed.)

That said, the above quote does promise a new report with more details and eventually a new report was released – but the details in the new report on methodology are still woefully inadequate.  They divide Syria up, interview three key informants in each area and then, somehow, calculate the number of dead people based on these interviews.  I have no idea what this calculation looks like.  There is a bit of description on how SCPR picked their key informants but, beyond that, the new report provides virtually no information relevant for evaluating the 470,000 figure.  The SCPR doesn’t even provide a copy of their questionnaire and I can hardly even guess at what it looks like.

One thing is clear though – they did not use the standard sample survey method for estimating the number of violent deaths.  Under this approach you pick a bunch of households at random, do interviews on the number of people who have lived and died in each one and extrapolate a national death rate based on death rates observed in your sample households.  If the SCPR had done something like this then at least I would’ve had a sense of where the 470,000 number came from, although I’d still want to know details.

I emailed Rabie Nasser asking for details but didn’t hear back.  Who knows.  Maybe my message went into his spam folder.  There are other people associated with this work and I’ll try to contact them and will report back if I hear something interesting.

I want to be clear.  I’m not saying that this work is useless for estimating the number of people killed in the Syrian war.  In fact, I suspect that the SCPR generated some really useful information on this question and on other issues as well.  But until they explain what they actually did I would just disregard the work, particularly the 470,000 figure.  I’m not saying that I think this number is too high or that it is too low.  I just think that it is floating in thin air without any methodological moorings to enable us to understand it.

Journalists should lay off press releases taking the form of “I did some unspecified research and here are my conclusions.”

 

New Paper on Accounting for Civilian War Casualties

Hello everybody.

The radio silence was much longer than intended but blog posts should start coming fast and furious now.  I’ve got a lot I want to get off my chest as soon as possible.

Let’s get the ball rolling with a new paper I have with Nicholas Jewell and Britta Jewell.  (Well, to be honest, it isn’t really a brand new paper but it’s newly accepted at a journal and we’re now putting it into the public domain.)

I dare say that this paper is a very readable introduction to civilian casualty recording and estimation, that is, to most of the subject matter of the blog.  I hope you will all have a look.

And, please, send in your comments..

More soon…..

PS – Here is an alternative link to the paper in case the first one doesn’t work for you.

 

Mismeasuring Deaths in Iraq: Addendum on Confidence Interval Calculations

Garfield_musing_CIs_533965604

In my last post I used a combination of bootstrapping and educated guesswork to find  confidence intervals for violent deaths in Iraq based on the data from the Roberts et al. survey.  (The need for guesswork arose because the authors have not been forthcoming with their data.)

Right after this went up a reader contacted me and asked whether the bottom of one of these confidence intervals can go below 0.

The short answer is “no” with the bootstrap method.  This technique can only take us down to 0 and no further.

Explanation

With bootstrapping we randomly select from a list of 33 clusters.  Of course, none of these clusters experienced a negative number of violent deaths. So 0 is the smallest possible count we can get for violent deaths in any simulation sample.  (In truth, the possibility of pulling 33 0’s is more theoretical than real.  This didn’t happen in any of my 1,000 draws of 33.)

Nevertheless, it turns out that if we employ the most common methods for calculating confidence intervals (not bootstrapping) then the bottom of the interval does dip below 0 when the dubious Fallujah cluster is included.

Here’s a step by step walk-through of the traditional method applied to the Roberts et al. data.  (I will assume that violent deaths are allocated across the 33 clusters as 18 0’s, 7 1’s, 7 2’s and 1 52.)

  1. Compute the mean number of violent deaths per cluster.  This is 2.2.  An indication that something is screwy here is the fact that the mean is bigger than the number of violent deaths in 32 out of the 33 clusters.  At the same time the mean is way below the number of violent deaths in the Fallujah cluster (52).  Note that without the Fallujah cluster the mean becomes 0.7, i.e., eliminating Fallujah cuts the mean by more than a factor of 3.
  2. Compute the sample standard deviation which is a measure of how strongly the number of violent deaths varies by cluster.  This is 9.0.  Note that if we eliminate the Fallujah cluster then the sample standard deviation plummets by more than a factor of 10, all the way down to 0.8.  This is just a quantitative expression of the obvious fact that the data are highly variable with Fallujah in there.  Note further that the big outlier observation affects the standard deviation more than it affects the mean.
  3. Adjust for sample size.  We do this by dividing the sample standard deviation by the square root of the sample size.  This gives us 1.6.  Here the idea is that you can tame the variation in the data by taking a large sample.  The larger the sample size the more you tame the data.  However, as we shall see, the Fallujah cluster makes it impossible to really tame the data with a sample of only 33 clusters.
  4. Unfortunately, the last step is mysterious unless you’ve put a fair amount of effort into studying statistics.  (This, alone, is a great reason to prefer bootstrapping which is very intuitive.)  Our 95% confidence interval for the mean number of violent deaths per cluster is, approximately, the average plus or minus 2 times 1.6, i.e., -1.0 to 5.4.  There’s the negative lower bound!
  5. We can translate from violent deaths per cluster to estimated violent deaths by multiplying by 33 and again by 3,000.  We end up with -100,000 to 530,000.  (I’ve been rounding at each step.  If, instead I don’t round until the very end I get -90,000 to 530,000….this doesn’t really matter.)  Note that without Fallujah we get a confidence interval of 30,000 to 90,000 which is about what we got with bootstrapping.

Have we learned anything here other than that I respond to reader questions?

I don’t think we’ve learned much, if anything, about violent deaths in Iraq.  We already knew that the Roberts et al. data, especially the Fallujah observation, is questionable and maybe the above calculation reinforces this view a little bit.

But, mostly, we learn something about the standard method for calculating confidence intervals; when the data are wild this method can give incredible answers.  Of course, a negative number of violent deaths is not credible.

There is an intuitive reason why the standard method fails with the Roberts et al. data; it forces a symmetric estimate onto highly asymmetric data.  Remember we get 2.2 plus or minus 3.2 average violent deaths per cluster.  The plus or minus means that the confidence interval is symmetric.  The Fallujah observation forces a wide confidence interval which has to go just as wide on the down side as it is on the up side.  In some sense the method is saying that if it’s possible to find a cluster with 52 violent deaths then it also must be possible to find a cluster with around -52 violent deaths.  But, of course, no area of Iraq  experienced -52 violent deaths.  So you wind up with garbage.

Part of the story is also the small sample size. With twice as many cluster, but the same sort of data, the lower limit would only go down to about 0.

It’s tempting to just say “garbage in, garbage out” and, up to a point, this is accurate.   But the bigger problem is that the usual method for calculating confidence intervals is not appropriate in this case.