Chilcot on Civilian Casualties: Part 5

This post continues my coverage of the three reports (one, two, three) written by UK government experts on the Roberts et al. 2004 article claiming that the 2003 invasion of Iraq caused a very large number of deaths.  According to the abstract of the paper:

We estimate that 98,000 more deaths than expected (8,000-194,000) happened after the invasion outside of Falluja and far more if the outlier Falluja cluster is included…Violent deaths were widespread, reported in 15 of 33 clusters, and were mainly attributed to coalition forces.  Most individuals reportedly killed by coalition forces were women and children.

Here’s some useful background.

Iraq Body Count (IBC) had already documented the violent deaths of nearly 20,000 civilians by the time the Roberts et al. paper was released.  So it was already clear that the war had caused a very large number of civilian deaths. The civilians chapter of the Chilcot Report does not suggest to me that this fact triggered deep concern within the UK government.  But the Roberts et al. paper produced a shock which I attribute mainly to its headline-grabbing figure of 100,000.

The 100,000 estimate is not directly comparable to IBC’s 20,000 count because 100,000 refers to excess deaths, i.e., violent plus non-violent deaths of civilians plus combatants beyond a baseline level, whereas IBC records only violent deaths of civilians.  There is also a phenomenally wide confidence interval of 8,000 to 194,000 surrounding the 100,000  estimate which severely complicates any comparison with another source.

Despite all these ambiguities media coverage tended to present the Roberts et al. results as reliably demonstrating in the prestigious scientific journal, The Lancet, that the war had caused 100,000 violent deaths of civilains.  This Guardian article is typical of much misleading media coverage.  There is no mention of a confidence interval, the excess-death estimate is portrayed as a violent-death estimate which is then presented as civilians-only when, in fact, the estimate mixes combatants with civilians. Such media attention further upped the ante on the 100,000 figure, making it still harder to ignore.

Roberts et al. conducted a “cluster survey“.   Specifically, they selected 33 locational points in Iraq and interviewed a bunch of close neighbours at each place.  Households located so close to one another are likely to have similar violence experiences.  So it’s probably more useful to view the sample as 33 data points, one for each cluster, rather than as roughly 1,000 data points, one for each household.

This is a tiny sample.

To get a handle on the sample-size problem consider some pertinent simulations I ran a few years back on some Iraq violence data. These show just how easy it is to overestimate violent deaths by factors of 2, 3 or more when you only have around 30 clusters. By the same token, surveys of this size can easily fail to detect a single violent death even when these surveys are conducted within very violent environments.


The problem with using a mere 33 clusters to measure war violence is intuitive.  Interviewers can easily stumble onto a few unusually violent hot spots and overestimate the average level of violence by a wide margin.   On the other hand, researchers can just as easily draw a qualitatively different kind of sample consisting of 33 peaceful islands.

The Roberts et al. survey seems to have landed on a super turbo charged version of this small sample issue.  They found a total of 21 violent deaths in 32 of their clusters, i.e., less than one death per cluster.  Yet they reported no fewer than 52 violent deaths in their 33rd cluster in Fallujah..

Such a sample yields estimates that are all over the place depending on your assumptions.  One standard calculation method (bootstrapping) leads to a central estimate of 210,000 violent deaths with a 95% confidence interval of around 40,000 to 600,000.  However, if you remove the Fallujah cluster the central estimate plummets to 60,000 with a 95% confidence interval of 40,000 to 80,000.  (I’ll give details on these calculations in a follow-up post.)

In short, there is no reliable way to create a stable estimate out of the Roberts et al. data. We would like to have an estimate that is robust to whether or not we include the extreme Fallujah outlier.  Alas, the usual methods are highly sensitive to whether the wild Fallujah observation is in or out.


Given this background I’m at a loss to explain how Sir Roy Anderson can describe the Roberts et al. methodology as “robust”.  In fact, he invokes the r-word in two successive sentences.  Yet extreme sensitivity to outliers is one of the main characteristics that earns estimates the lable “non-robust”.

Sir Roy notices that the sample is small but goes nowhere from this starting point.  He seems unaware that war violence tends to cluster heavily at some locations.  Indeed, he  did not even read the Roberts et al. paper carefully enough to discern that their sample displays this pattern in spades.

Sir Roy swings and a misses in another, more subtle, way.  He points, rightly, to a key measurement problem with the Roberts et al. methodology – how do we know that households reporting deaths really did suffer these reported deaths?  He notes that Roberts et al. try to diffuse this issue by checking death certificates.  However, they only check for death certificates in a small non-random sample of their reported deaths and 20% of these checks were failures. So there is plenty of room to question the veracity of many of the reported deaths in the survey.

This is a good catch for Sir Roy but he doesn’t then ascend to the next level.  Suppose that out of the households that did not experience a violent death a mere 1% are recorded as having one anyway.  Since there must be more than 900 such households, this error rate would generate around 9 falsely reported deaths.  These false reports would then translate into about 27,000 estimated violent deaths.  Thus, a small rate of “false positives” can inflate the number of estimated deaths quite substantially, creating another non-robustness issue for the Roberts et al. methodology.

Someone might respond that we don’t have to worry about “false positives” because there will also be “false negatives”, i.e., households that experienced real deaths that somehow don’t get recorded.  However, this view is wrong because the situation is fundamentally asymmetric.  If roughly 50 households experienced violent deaths and 1% of these failed to report these deaths then we’d expect to miss only 0 or 1 real deaths this way.  So a 1% false negative rate will deflate an estimate by much less than a 1% false positive rate will inflate the same estimate.

(Alert readers will have noticed that I just described the base rate fallacy.  See the last slides of this presentation for more details.)

To summarize, Sir Roy wasted the small amount of effort he invested in his report.

Creon Butler at least had a serious go at evaluating the Roberts et al. paper, managing to notice some important new points that eluded Sir Roy.  I list the better ones here.  First on this positive side of the ledger is that Butler at least mentions the crucial Fallujah cluster.  Second, he correctly questions whether the sample is genuinely random.  Butler notes, in particular, that:

  1. The Fallujah field team did not follow the survey’s official randomization methodology when they  selected that cluster.
  2. Six of Iraq’s 18 governorates were excluded from the sample, although Butler thinks this was OK since they were randomly excluded.

Third, Butler draws attention to the preposterously wide confidence interval in the estimate for excess deaths – 8,000 to 194,000.  Fourth, Butler realizes, rightly, that the Roberts et al. figures for violent deaths suggest that hospitals should have received vastly more injured people than the figures of the Iraqi Ministry of Health (MoH) suggest they actually received.

Despite these strengths the Butler report is still weak.  As noted in post number 4 all three expert reports, including Butler’s, missed some central problems with the Roberts et al. paper.  Beyond that, Butler is strangely tolerant of the weaknesses he finds. Here are a few examples:

  1. He knows that the Fallujah field team violated the sampling protocols and then recorded a tremendous outlier observation that was then excluded from the main estimate published in the paper.  But it never seems to occur to him that such a serious data quality issue in one cluster could signal a deeper data quality problem affecting other clusters.
  2. He notices, but immediately shies away from, a weird aspect of the sampling scheme.  Twelve governorates are divided into two pairs with one governorate from each pair selected randomly for sampling and the other one excluded from the sample.  At a stretch we can view this as an acceptable way to claim national coverage for the survey while actually excluding 6 governorates from the sample.  But to do this legitimately you need to build this source of random variation into your confidence interval.  Roberts et al. don’t do this.  So even the gargantuan confidence interval of 8,000 to 194,000 is actually too narrow.
  3. Butler does a bad job of quantifying his point about injuries.  For example, he should have mentioned that the MoH recorded 15,517 injuries during the last 6 months covered by the survey.  Roberts et al. have something like 56 violent deaths during this period which translates into around 170,000 estimated violent deaths. Assuming a rule of thumb of 3 injuries per death one could predict 500,000 injuries, a number which exceeds the MoH figure by more than a factor of 300.  Note, moreover, that people with serious injuries should almost always put in an appearance at a hospital.  So there is really something to explain here.  Yet Butler  pretty much lets this discrepancy pass.

To summarize, in an era of grade inflation Creon Butler gets a gentleman’s pass.

Bill Kirkup of the Department of Health wrote the most perceptive UK government analysis although his paper is marred by one big error.  Here are some of his strong points:

  1.  He spots the absurdity of the confidence interval and grasps the magnitude of the problem – “A confidence interval this large makes the meaning of the estimate difficult to interpret. This point has been largely ignored in media reporting.”
  2. He is aware of what he calls the “patchy distribution of violence” in war and he realizes that this feature renders the survey’s 33 sampling points to be precious few.  He connects the reported results for Fallujah with this patchiness issue.  (You might say this is obvious but the other two experts missed it.)
  3. He identifies an annoying tendency for Roberts et al. to make detailed claims about types of deaths, e.g., the percent of all deaths accounted for by coalition air strikes, without providing numerical tables sufficiently detailed to flesh out these claims.  It appears that some such claims rely on data from the dubious Fallujah outlier which is something we would like to know whenever this is the case.  But it is often hard to be sure of such dependence on Fallujah without more information.  This is information that the authors could easily supply but chose not to do so.  Such “reticence”, as Kirkup puts it, does not inspire confidence.
  4. He realizes that the arrival of a survey team into a neighbourhood will draw the attention of local dangerous individuals.  These violent thugs will pressure people to answer the survey questions in ways that further their agendas.  Such local dynamics decrease the reliability of the data and place the survey’s interviewees at risk.  (I blogged recently on this issue in a similar context.)

Kirkup’s big error is that, somehow, he estimates that only 23,000 of the 98,000 estimated excess deaths (outside Fallujah) were violent deaths.  But a very easy back-of-the-envelope calculation shows that such a low number can’t possibly be right.  (21 violent deaths in the sample, around 8,000 people in the sample in a country of around 24 million – 24,000,000 x 21/8,000 gives around 60,000 violent deaths).  This mistake messes up Kirkup’s report substantially.

Nevertheless, I still think that Kirkup delivered the best report because he alone grasps the fundamental low quality of the Roberts et al. paper.

Where does this leave us?

First, I’d like to soften my criticism of the government evaluators a little bit.  In this post and in the last one I’ve tried to impose a ground rule of evaluating the evaluators based only on information that was available to them when they did their reports.  (I’ll drop this straighjacket in the next post in this series)  But it is hard to maintain this discipline and I’m sure that I’ve allowed myself to benefit in certain ways from some hindsight knowledge.  In addition, I’m sure these guys were under pressure to produce lots of stuff really fast so they couldn’t make every project into their best work.

That said, the casualties of war is a vital issue.  So the UK government should have allowed its analysts the space, and perhaps the outside consultants, they needed to give their work on civilian casualties its due.  (Of course, this applies even more strongly to the US government which has avoided a Chilcot-type enquiry in the first place.)

Finally, I’d like to give a sense of what I think a good report would have looked like.  Here’s a provisional list of key points:

  1.  We already knew that thousands of people are dying because of the Iraq war.
  2.  We should track these deaths closely and, more importantly, use the tracking data to figure out ways to save lives.  (I can’t find anything in the Chilcot Report to suggest that anyone in the government was thinking about this.)
  3. The Roberts et al. paper doesn’t change this picture qualitatively but it does suggest that people could be dying at far greater rates in the war than anyone has previously suggested.
  4. However, the Roberts et al. methodology is extremely weak and unreliable (see the technical appendix to this report) so we shouldn’t count on it except possibly on points that can be corroborated from other sources.
  5. Nevertheless, we should request the detailed data from this project and also from  Iraq Body Count and see whether we can learn something helpful from them.
  6. We should issue a public statement saying that we are not convinced by the Roberts et al. study at this moment but we have requested the data and are looking into it.  Meanwhile, we are very concerned about civilain casualties in Iraq and are working hard to reduce them.
  7. Point 6 should be reality, not just a public relations position.

7 thoughts on “Chilcot on Civilian Casualties: Part 5

  1. Hi. I’d like to say thanks for a good analysis – I agree with nearly everything you say – and for your generally positive comments about my paper for the Department of Health. I’d like to defend my projected number of non-Fallujah violent deaths though.

    The back of the envelope check you propose doesn’t work, because of the method of calculation of excess deaths by Roberts et al. First, the people sampled reported deaths over more than a year, hence the need to calculate person-months in the paper, and second, the paper used a regression model to estimate deaths, not simple extrapolation. As an illustration, using your back of the envelope calculation would yield 426,000 total deaths and 288,000 excess deaths, which is well above those quoted by Roberts.

    Unfortunately I no longer have access to the detailed workings as I left the DH at the end of 2009, but an alternative rough calculation would be to take the proportion of excess non-Fallujah deaths attributed to violence (0.21) and apply it to the projected total excess deaths in the paper, 98,000, which would yield an estimated 20,400 violent deaths. That’s not quite the 23,000 I got from more detailed modelling, but it’s not that far off, given that the whole thing is constructed on the Roberts et al tissue-paper edifice anyway. In short, I don’t believe it’s an error, let alone a big one.

    It may be relevant to say that I was based in Baghdad in 2003 working on reconstruction and public health, and with first hand knowledge of conditions on the ground I agree with your assessment that the survey work could not have been undertaken in the way that the paper claims. That underpins the reference to bullying and intimidation, but I was keen to keep the assessment based mostly on the quantifiable aspects.


  2. Hello. It’s great that you’ve responded!

    I’m still puzzled by where your numbers come from and would love to get to the bottom of this. Let me try to get this ball rolling a little further by applying my back-of-the-envelope method in find my own way to an excess death estimate.

    To clarify a bit, I only calculate violent deaths above. Below I will calculate excess deaths using, essentially, the same method but with one modification to account for the fact that the pre-war period covered by the Roberts et al. survey is shorter than the during-war period.

    Your government report stressed, rightly, how Roberts et al. often fail to provide basic information that would be easy to supply and that would also be helpful. The current situation is a case in point. We wish to make an excess death calculation for the whole sample minus the Fallujah cluster. This requires us to subtract pre-war Fallujah deaths from total pre-war deaths and during-war Fallujah deaths from total during war deaths. However, so far as I can tell the paper does not supply pre-war Fallujah deaths. It does supply during-war violent deaths in Fallujah but, so far as I can tell, it does not supply during-war non-violent deaths in Fallujah.

    I will cope with this lack of information by assuming that the two missing numbers are equal to 0. I think we can be fairly confident that these assumptions are not really far from the truth. (Actually, some more information emerged later and I’ll use this in future blogging but for now I will ignore it.)

    Total pre-war deaths – 46

    Total pre-war deaths outside Fallujah – 46 (assuming 0 pre-war deaths in Fallujah)

    Total during-war deaths – 142

    Total during-war deaths outside Fallujah – 90 (assuming there were 0 during-war non-violent deaths in Fallujah)

    Length of pre-war period covered by the survey – 14.6 months

    Length of during-war period covered by the survey – 17.8 months

    I handle the differing period lengths by multiplying the number of during-war deaths by 14.6/17.8.

    Adjusted during-war deaths outside Fallujah – 74 (rounded to the nearest death)

    In-sample excess deaths 74 – 46 = 28

    Estimated excess deaths – 28 *3,0000 = 84,000

    So I get something close to the 98,000 that Roberts et al. get, not something close to 300,000 as you suggest my method would lead.

    My total number of deaths outside Fallujah would be 90 * 3,000 = 270,000.

    We can also calculate the percentage of excess deaths that were violent.

    During-war violent deaths outside Fallujah – 21

    Adjusted during-war violent deaths outside Fallujah – 21 * (14.6/17.8) = 17

    Percentage of excess deaths that were violent – 100 * 17/28 = 61%

    These calculations are certainly crude. It would be easy to make improvements. But I don’t think any improvements would make a huge difference.

    By the way, the most likely explanation for why my excess-death estimate is below the official one in Roberts et al. is that there probably were a few pre-war deaths in Fallujah but I’ve assumed there were none.

    I quite value your on the ground observations. Please keep on eye on the blog over the next few weeks and feel free to jump in. I’ll present evidence suggesting that the field teams cut corners to the point where the sample can’t really be viewed as random in any reasonable sense.


    1. Thanks for the response. When I read it, I went back to try and reconstruct from scratch what I’d done in 2004, with mixed results. Most importantly, I have to say that I think that your approach to the distribution of violent/non-violent excess deaths is better than mine, so thank you for pointing this out. The modelling I used was based on crude mortality rates applied to the number of person-months of exposure, which are given in the paper for the before and after groups, and then back-filling to estimate the missing numbers such as the person-months of exposure inside and outside Fallujah. This should be a more accurate reflection of the Roberts projections and, combined with your correction gives an estimate of 59,000 excess violent deaths amongst the 98,000 estimated overall. I’m sorry that I missed this at the time, but grateful to you for pointing it out.

      What this does though, using the same approach and numbers, is produce some spectacularly unbelievable results for Fallujah itself. Thus the crude mortality rate is supposed to have been 196 per thousand per year, and that due to violence 192 per thousand per year. I’m not sure that even the Black Death achieved that sort of mortality rate. Taking the excess mortality rates again, that would make 163,000 excess deaths due to violence over the 17.8 month period in Fallujah. Using the (back-filled) Roberts estimated population of 573,000 (it’s more than Fallujah itself as includes part of the rest of the Anbar governorate), that would be an additional 28% of the population dying as a result of violent deaths in less than 18 months. I think that alone is enough to render the whole study worthless.

      I’ll be interested to see your assessment of the conditions affecting the field teams, having agreed with what you’ve said so far on this.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s