In October of 2004 The Lancet published a paper by Roberts et al. that estimated the number of excess deaths for the first year and a half of the Iraq war using data from a new survey they had just conducted. (Readers wanting a refresher course on the concept of excess deaths can go here.)
One of the best parts of the civilian casualties chapter of the Chilcot report is the front-row seat it provides for the (rather panicked) discussion that Roberts et al. provoked within the UK government. Here the real gold takes the form of links to three separate reviews of the paper provided by government experts. The experts are Sir Roy Anderson of the first report, Creon Butler of the second report and Bill Kirkup, CBE of the third report.
In the next several posts I will evaluate the evaluators. I start by largely incorporating only information that was available when they made their reports. But I will, increasingly, take advantage of hindsight..
For orientation I quote the “Interpretation” part of the Summary of Roberts et al.:
Making conservative assumptions, we think that about 100,000 excess deaths, or more have happened since the 2003 invasion of Iraq. Violence accounted for most of the excess deaths and airstrikes from coalition forces accounted for most violent deaths. We have shown that collection of public-health information is possible even during periods of extreme violence. Our results need further verification and should lead to changes to reduce non-combatant deaths from air strikes.
The UK government reaction focused exclusively, so far as I can tell, on the question of how to respond to the PR disaster ensuing from:
- The headline figure of 100,000 deaths which was much bigger than any that had been seriously put forward before.
- The claim that the Coalition was directly responsible for most of the violence. (Of course, one could argue that the Coalition was ultimately responsible for all violence since it initiated the war in the first place but nobody in the government took such a position.)
Today I finish with two important points that none of the three experts noticed.
First, the field work for the survey could not have been conducted as claimed in the paper. The authors write that two teams conducted all the interviews between September 8 and September 20, i.e., in just 13 days. There were 33 clusters, each containing 30 households. This means that each team had to average nearly 40 interviews per day, often spread across more than a single sampling point (cluster). These interviews had be on top of travelling all over the country, on poor roads with security checkpoints, to reach the 33 clusters in the first place.
To get a feel for the logistical challenge that faced the field teams consider this picture of the sample from a later, and much larger, survey – the Iraq Living Conditions Survey:
I know the resolution isn’t spectacular on the picture but I still hope that you can make out the blue dots. There are around 2,200 of them, one for each cluster of interviews in this survey.
Now imagine choosing 33 of these dots at random and trying to reach all of them with two teams in 13 days. Further imagine conducting 30 highly sensitive interviews (about deaths of family members) each time you make it to one of the blue points. If a grieving parent asks you to stay for tea do you tell to just answer your questions because you need to move on instantly?
The best-case scenario is that is that the field teams cut corners with the cluster selection to render the logistics possible and then raced through the interviews at break-neck speed (no more than 10 minutes per interview). In other words, the hope is that the teams succeeded in taking bad measurements of a non-random sample (which the authors then treat as random). But, as Andrew Gelman reminds us, accurate measurement is hugely important.
The worst-case scenario is that field teams simplified their logistical challenges by making up their data. Recall, that data fabrication is widespread in surveys done in poor countries. Note, also, that the results of the study were meant to be released before the November 2 election in the US and the field work was completed only on September 20; so slowing down the field work to improve quality was not an option.
Second, no expert picked up on the enormous gap between the information on death certificates reported in the Roberts et al. paper and the mortality information the Iraqi Ministry of Health (MoH) was releasing at the time. A crude back-of-the-envelope calculation reveals the immense size of this inconsistency:
- The population of Iraq was, very roughly, 24 million and the number of people in the sample is reported as 7,868. So each in-sample death translates into about 3,000 estimated deaths (24,000,000/7,868). Thus, the 73 in-sample violent deaths become an estimate of well over 200,000 violent deaths.
- Iraq’s MoH reported 3,858 violent deaths between April 5, 2004 and October 5, 2004, in other words a bit fewer than 4,000 deaths backed by MoH death certificates. The MoH has no statistics prior to April 5, 2004 because their systems were in disarray before then (p. 191 of the Chilcot chapter)
- Points 1 and 2 together imply that death certificates for violent deaths should have been present only about 2% of the time (200,000/4,000).
- Yet Roberts et al. report that their field teams tried to confirm 78 of their recorded deaths by asking respondents to produce death certificates and that 63 of these attempts (81%) were successful.
The paper makes clear that the selection of the 78 cases wasn’t random and it could be that death certificate coverage is better for non-violent deaths than it is for violent deaths.
There is a big, yawning, large, humongous massive gap between 2% and 81% and something has to give.
Here are the only resolution possibilities I can think of::
- The MoH issued vastly more (i.e., 50 times more) death certificates for violent deaths than it has admitted to issuing. This seems far fetched in the extreme.
- The field teams for Roberts et al. fabricated their death certificate confirmation figures. This seems likely especially since the paper reports:
Interviewers were initially reluctant to ask to see death certificates because this might have implied they did not believe the respondents, perhaps triggering violence. Thus, a compromise was reached for which interviewers would attempt to confirm at least two deaths per cluster.
Compromises that pressure interviewers to risk their lives are not promising and can easily lead to data fabrication.
3. The survey picked up too many violent deaths. I think this is true and we will return to this possibility in a follow-up post but I don’t think that this can be the main explanation for the death certificate gap.
OK, that’s enough for today.
In the next post I’ll discuss more what the expert reports actually said rather than what they didn’t say.