Secret Data Sunday – The Iraq Child and Maternal Mortality Survey

Many readers of the blog know that there was a major cock-up over child mortality figures for Iraq.  In fact, exaggerated child mortality figures have been used to justify the 2003 invasion of Iraq, both prospectively and retrospectively.

Here I won’t repeat the basics one more time, although anyone unfamiliar with this debacle should click on the above link which, in turn, offers further links providing more details.

Today I just inject one new point into this discussion – the dataset for the UNICEF survey that wildly overestimated Iraq’s child mortality rates is not available.  (To be clear, estimates from this dataset are available but the underlying data you need to audit the survey are hidden.)

The hidden survey is called the Iraq Child and Maternal Mortality Survey  (ICMMS).  This graph (which you can enlarge on your screen) reveals the ICMMS as way out of line with no fewer than four subsequent surveys, all debunking the stratospheric ICMMS child mortality estimates.  The datasets for three of the four contradicting surveys are publicly available and open to scrutiny (I will return to the fourth of the contradicting surveys in a future blog post.)

But the ICMMS dataset is nowhere to be found – and I’ve looked for it.

For starters, I emailed UNICEF but couldn’t find anyone there who had it or was willing to share it.

I also requested the dataset multiple times from Mohamed Ali, the consulting statistician on the survey who now is at the World Health Organization (WHO).

At one point Mohamed directed me to the acting head of the WHO office in Iraq who blew me off before I had a chance to request the data from him.  But, then, you have to wonder what the current head of the WHO office in Iraq has to do with a 1990’s UNICEF survey, anyway.

I persisted with Mohamed who then told me that if he still has the data it would be somewhere on some floppy disk.  This nostalgic reminder of an old technology is kind of cute but doesn’t let him off the hook for the dataset which I never received on a floppy disk or otherwise.

There is a rather interesting further wrinkle on this saga of futility.  The ICMMS dataset was heavily criticized in research commissioned for the UN’s oil for food report:

It is clear, however, that widely quoted claims made in 1995 of 500,00 deaths of children under 5 as a result of sanctions were far too high;

John Blacker, Mohamed Ali and Gareth Jones then responded to this criticism with a 2007 academic article defending the ICMMS dataset:

A response to criticism of our estimates of under-5 mortality in Iraq, 1980-98.

Abstract

According to estimates published in this journal, the number of deaths of children under 5 in Iraq in the period 1991-98 resulting from the Gulf War of 1991 and the subsequent imposition of sanctions by the United Nations was between 400,000 and 500,000. These estimates have since been held to be implausibly high by a working group set up by an Independent Inquiry Committee appointed by the United Nations Secretary-General. We believe the working group’s own estimates are seriously flawed and cannot be regarded as a credible challenge to our own. To obtain their estimates, they reject as unreliable the evidence of the 1999 Iraq Child and Maternal Mortality Survey–despite clear evidence of its internal coherence and supporting evidence from another, independent survey. They prefer to rely on the 1987 and 1997 censuses and on data obtained in a format that had elsewhere been rejected as unreliable 30 years earlier.

 

For the record, the Blacker, Ali and Jones article is weak and unconvincing and I may make it the subject of a future blog post.  But today I just concentrate on the (non)availability of the ICMMS dataset so I won’t wander off into a critique of their article.

Thinking purely in terms of data availability, the 2007 article raises some interesting questions.  Was Mohamed Ali still working off of floppy disks in 2007 when he published this article?  Surely he must have copied the dataset onto a hard disk to do the analysis.  And what about his co-authors?  They must have the dataset too, no?

Unfortunately, John Blacker has passed away but Gareth Jones is still around so I emailed him asking for the ICMMS dataset which he had defended so gamely.

He replied that he never had the dataset.

Let that point sink in for a moment.   Jones co-authored an article in an academic journal, the only point of which was to defend the quality of a dataset.  Yet, he never saw the dataset that he defended?  Sorry but this doesn’t work for me.  As far as I’m concerned when you write an article that is solely about the quality of a dataset then you need to at least take a little peek at the dataset itself.

I see two possibilities here and can’t decide which is worse.  Either these guys are pretending that they don’t have a dataset that they actually do have because they don’t want to make it public or they have been defending the integrity of a dataset they don’t even have.  Either way, they should stop the charade and declare that the ICMMS was just a big fat mistake.

I have known for a long time that the ICMMS was crap but the myth it generated lives on.  It is time for the principle defenders of this sorry survey to officially flush it down the toilet.

 

 

 

 

 

New Paper on Accounting for Civilian War Casualties

Hello everybody.

The radio silence was much longer than intended but blog posts should start coming fast and furious now.  I’ve got a lot I want to get off my chest as soon as possible.

Let’s get the ball rolling with a new paper I have with Nicholas Jewell and Britta Jewell.  (Well, to be honest, it isn’t really a brand new paper but it’s newly accepted at a journal and we’re now putting it into the public domain.)

I dare say that this paper is a very readable introduction to civilian casualty recording and estimation, that is, to most of the subject matter of the blog.  I hope you will all have a look.

And, please, send in your comments..

More soon…..

PS – Here is an alternative link to the paper in case the first one doesn’t work for you.

 

Data Dump Friday – Part 2, some Iraq Polling Data

Happy Friday.

I have just posted here three new things.

  1. A list of all the data sets that Steve Koczela obtained from the State Department through his successful FOIA application.
  2. An Iraq poll from April 2006 fielded by the Iraq Center for Research and Strategic Studies (ICRSS).  [Note – this organization seems to be defunct.  Perhaps someone out there knows something about this?]
  3. An Iraq poll also from April 2006 and asking the same questions as the ICRSS poll but fielded by the notorious combination of D3 Systems and KA Research Limited (KARL).

We already saw a head to head comparison of these two polls that left no doubt that much of the D3/KA data were fabricated (see also this post).

More next week!

Special Journal Issue on Fabrication in Survey Research

The Statistical Journal of the IAOS has just released a new issue with a bunch of articles on fabrication in survey research, a subject of great interest for the blog.

Unfortunately, most of the articles are behind a paywall but, thankfully, the overview by Steve Koczela and Fritz Scheuren is open access.  It’s a beautiful piece – short, sweet, wise and accurate.  Please read it.

Here are my comments.

Way back in 1945 the legendary Leo Crespi stressed the importance of what he called “the cheater problem.”  Although he did this in the flagship survey research journal, Public Opinion Quarterly, the topic has never become mainstream in the profession.  Many survey researchers seem to view the topic of fabrication as not really appropriate for polite company, akin to discussing the sexual history of a bride at her wedding.  Of course, this semi taboo is convenient for cheaters.  Maria Konnikova has a great new book about confidence artists.  Much in the book is relevant to the subject of fabrication in survey research but one point really stands out for me; a key reason why the same cons and the same con artists move seamlessly from mark to mark is that each victim is too embarrassed  to publicize his/her victimization.  276365-smiley 4

Discussions of fabrication that have occurred over the years have almost always focused on what is known as curbstoning, i.e., a single interviewer making up data. (The term comes from an image of a guy sitting on a street curb filling out his forms.)  But this is just one type of cheating and one of the great contributions of Koczela and Scheuren’s  journal edition and the impressive series of prior conferences is that have substantially expanded the scope of the survey fabrication field.  Now we discuss fabrication by supervisors, principal investigators and the leaders of a survey companies.  We now know that  hundreds of public opinion surveys, especially surveys conducted in poor countries, are contaminated by widespread duplication and near duplication of single observations.  (This journal issue publishes the key paper on duplication.)

Let me quote a bit from the to-do list of Koczela an Scheuren.

It does not only happen to small research organizations with fewer resources, as was previously believed [12].  Recent instances involve the biggest and most names in the survey research business, academia and the US Government.

This is certainly true but I would add that reticence about naming names is crippling.  Yes, it’s helpful to know that there are many dubious surveys out there but guidance on which ones they are would be very helpful.

An acknowledgement by the research community that data fabrication is a common threat, particularly in remote and dangerous survey environments would allow the community to be cooperative and proactive in preventing, identifying and mitigating the effects of fabrication.

This comment about remote and dangerous survey environments fits perfectly with my critiques of Iraq surveys including this one.

Given the perceived stakes, these discussion often result in legal threats or even legal action of various types.

Ummm….yes.

…the problem of fabrication is fundamentally one of co-evolution.  The more detection and prevention methods evolve, the more fabricators may evolve to stay ahead.  And to the extent we discover and confirm fabrication, we will never know whether we found it all, or caught only the weakest of the pack.  With these truths in mind, more work is needed in developing and testing statistical methods of fabrication detection.  This is made more difficult by the lack of training datasets, a problem prolonged by a general unwillingness to openly discuss data fabrication.

Again, I couldn’t agree more.

Technical countermeasures during fielding are less useful in harder to survey areas, which also happen to be the areas where the incentive to fabricate data is the highest. Many of the recent advances in field quality control processes focus on areas where technical measures such as computer audio recording, GPS, and other mechanisms can be used [6,13].

In remote and dangerous areas, where temptation to fabricate is the highest, technical countermeasures are often sparse [9]. And perversely, these are often the most closely watched international polls, since they often represent the hotspots of American interest and activity. Robbins and Kuriakose show a heavy skew in the presence of duplicate cases in non-OECD countries, potentially a troubling indicator. These polls conducted in remote areas often have direct bearing on policy for the US and other countries. To get a sense of the impact of the polls, a brief review of the recently released Iraq Inquiry, the so-called Chilcot report, contains dozens of documents that refer, in most cases uncritically, to the impact and importance of polls.

To be honest, Koczela and Scheuren do such a great job with their short essay that I’m struggling to add value here.  What they write above is hugely pertinent to all the work I’ve done on surveys in Iraq.

By the way, a response I sometimes get to my critiques of the notorious Burnham et al. survey of deaths in the Iraq war (see, for example, here, here and here) is that it is unreasonable to expect perfection for a survey operating in such a difficult environment.  Fair enough.  But then you have to concede that we cannot expect high-quality results from such a survey either.  If I were to walk in off the street and take Harvard’s PhD qualification exam in physics (I’m assuming they have such a thing….) it would be unreasonable to expect me to do well.  I just haven’t prepared for such an exam.  Fine, but that doesn’t somehow make me an authority on physics.  It just gives me a perfect excuse for not being such an authority.

Finally, Koczela and Scheuren provide a mass of resources that researchers can use to bring themselves to the frontier of the survey fabrication field.  Anyone interested in this subject needs to take a look at these resources.

Mismeasuring Deaths in Iraq: Addendum on Confidence Interval Calculations

Garfield_musing_CIs_533965604

In my last post I used a combination of bootstrapping and educated guesswork to find  confidence intervals for violent deaths in Iraq based on the data from the Roberts et al. survey.  (The need for guesswork arose because the authors have not been forthcoming with their data.)

Right after this went up a reader contacted me and asked whether the bottom of one of these confidence intervals can go below 0.

The short answer is “no” with the bootstrap method.  This technique can only take us down to 0 and no further.

Explanation

With bootstrapping we randomly select from a list of 33 clusters.  Of course, none of these clusters experienced a negative number of violent deaths. So 0 is the smallest possible count we can get for violent deaths in any simulation sample.  (In truth, the possibility of pulling 33 0’s is more theoretical than real.  This didn’t happen in any of my 1,000 draws of 33.)

Nevertheless, it turns out that if we employ the most common methods for calculating confidence intervals (not bootstrapping) then the bottom of the interval does dip below 0 when the dubious Fallujah cluster is included.

Here’s a step by step walk-through of the traditional method applied to the Roberts et al. data.  (I will assume that violent deaths are allocated across the 33 clusters as 18 0’s, 7 1’s, 7 2’s and 1 52.)

  1. Compute the mean number of violent deaths per cluster.  This is 2.2.  An indication that something is screwy here is the fact that the mean is bigger than the number of violent deaths in 32 out of the 33 clusters.  At the same time the mean is way below the number of violent deaths in the Fallujah cluster (52).  Note that without the Fallujah cluster the mean becomes 0.7, i.e., eliminating Fallujah cuts the mean by more than a factor of 3.
  2. Compute the sample standard deviation which is a measure of how strongly the number of violent deaths varies by cluster.  This is 9.0.  Note that if we eliminate the Fallujah cluster then the sample standard deviation plummets by more than a factor of 10, all the way down to 0.8.  This is just a quantitative expression of the obvious fact that the data are highly variable with Fallujah in there.  Note further that the big outlier observation affects the standard deviation more than it affects the mean.
  3. Adjust for sample size.  We do this by dividing the sample standard deviation by the square root of the sample size.  This gives us 1.6.  Here the idea is that you can tame the variation in the data by taking a large sample.  The larger the sample size the more you tame the data.  However, as we shall see, the Fallujah cluster makes it impossible to really tame the data with a sample of only 33 clusters.
  4. Unfortunately, the last step is mysterious unless you’ve put a fair amount of effort into studying statistics.  (This, alone, is a great reason to prefer bootstrapping which is very intuitive.)  Our 95% confidence interval for the mean number of violent deaths per cluster is, approximately, the average plus or minus 2 times 1.6, i.e., -1.0 to 5.4.  There’s the negative lower bound!
  5. We can translate from violent deaths per cluster to estimated violent deaths by multiplying by 33 and again by 3,000.  We end up with -100,000 to 530,000.  (I’ve been rounding at each step.  If, instead I don’t round until the very end I get -90,000 to 530,000….this doesn’t really matter.)  Note that without Fallujah we get a confidence interval of 30,000 to 90,000 which is about what we got with bootstrapping.

Have we learned anything here other than that I respond to reader questions?

I don’t think we’ve learned much, if anything, about violent deaths in Iraq.  We already knew that the Roberts et al. data, especially the Fallujah observation, is questionable and maybe the above calculation reinforces this view a little bit.

But, mostly, we learn something about the standard method for calculating confidence intervals; when the data are wild this method can give incredible answers.  Of course, a negative number of violent deaths is not credible.

There is an intuitive reason why the standard method fails with the Roberts et al. data; it forces a symmetric estimate onto highly asymmetric data.  Remember we get 2.2 plus or minus 3.2 average violent deaths per cluster.  The plus or minus means that the confidence interval is symmetric.  The Fallujah observation forces a wide confidence interval which has to go just as wide on the down side as it is on the up side.  In some sense the method is saying that if it’s possible to find a cluster with 52 violent deaths then it also must be possible to find a cluster with around -52 violent deaths.  But, of course, no area of Iraq  experienced -52 violent deaths.  So you wind up with garbage.

Part of the story is also the small sample size. With twice as many cluster, but the same sort of data, the lower limit would only go down to about 0.

It’s tempting to just say “garbage in, garbage out” and, up to a point, this is accurate.   But the bigger problem is that the usual method for calculating confidence intervals is not appropriate in this case.

Mismeasuring War Deaths in Iraq: Confidence Interval Calculations

We return again to the Roberts et al. paper.

In part 5 of my postings on the Chilcot Report I promised to discuss the calculations of confidence intervals underlying these claims:

One standard calculation method (bootstrapping) leads to a central estimate of 210,000 violent deaths with a 95% confidence interval of around 40,000 to 600,000.  However, if you remove the Fallujah cluster the central estimate plummets to 60,000 with a 95% confidence interval of 40,000 to 80,000.  (I’ll give details on these calculations in a follow-up post.)

I have to start with some caveats.

Caveat 1:  No household data – failure to account for this makes confidence intervals too narrow

As we know the authors of the paper have not released a proper dataset.  To do this right I would need to have violent deaths by household but the authors are holding this information back.  Thus, I have to operate at the cluster level.  This shortcut suppresses household-level variation which, in turn, constricts the widths of the confidence intervals I calculate.  It’s possible to get a handle on the sizes of these  effects but I won’t go there in this blog post.

Caveat 2: Violent deaths are not broken down by cluster – confidence intervals depend on how I resolve this ambiguity

Roberts et al. don’t provide us with all the information we need to proceed optimally at the cluster level either since they don’t tell us the number of violent deaths in each of their 33 clusters.  All they say in the paper (unless I’ve missed something) is that the Fallujah cluster had 52 violent deaths and the other 32 clusters combined had 21 violent deaths spread across 14 clusters.  I believe this is the best you can do although maybe a clever reader can mine the partial striptease to extract a few more scraps of information on how the 21 non-Fallujah violent deaths are allocated across clusters.

This ambiguity leaves many possibilities.  Maybe 13 clusters had one violent death and one cluster had the remaining eight.  Or maybe ten cluster had one death, three clusters had two deaths and the last cluster had 5 violent deaths.  Etc.

To keep things simple I’ll consider just four scenarios.  This first is that there are 18 clusters with 0 deaths, 7 clusters with 1 death, 7 clusters with 2 deaths and the Fallujah cluster with 52 deaths.  The second is that there are 18 clusters with 0 deaths, 13 clusters with 1 death, 1 cluster with 8 deaths and the Fallujah cluster with 52 deaths. The third and fourth scenarios are the same as the first and second except that the latter toss out the Fallujah clusters.

Caveat 3: There is a first stage to the sampling procedures that tosses out 6 governorates – failure to account for this makes the confidence intervals too narrow.  (I already alluded to this issue in this post.)

I quote from the paper:

During September, 2004, many roads were not under the control of the Government of Iraq or coalition forces. Local police checkpoints were perceived by team members as target identification screens for rebel groups.  To lessen risk to investigators, we sought to minimise travel distance and the number of Governorates to visit, while still sampling from all regions of the country. We did this by clumping pairs of Governorates. Pairs were adjacent Governorates that the Iraqi study team members believed to have had similar levels of violence and economic status during the preceding 3 years.

Roberts et al. randomly selected one governorate from each pair, visited only the selected governorates and ignored the non-selected ones.  So, for example, Karbala and Najaf were a pair.  In the event Karbala was selected and the field teams never visited Najaf.  In this way Dehuk, Arbil, Tamin, Najaf, Qadisiyah and Basrah were all eliminated.

This is not OK.

The problem is that the random selection of 6 governorates out of 12 introduces variation into the measurement system that should be, but isn’t, built into the confidence intervals calculated by Roberts et al.  This problem makes all the confidence intervals in the paper too narrow.

It’s worth understanding this point well so I offer an example.

Suppose I want to know the average height of students in a primary school consisting of  grades 1 through 8.  I get my estimate by taking a random sample of 30 students and averaging their heights.  If I repeat the exercise by taking another sample of 30 I’ll get a different estimate of average student height.  Any confidence interval for this sampling procedure will be based on modelling how these 30-student averages vary across different random samples.

Now suppose that I decide to save effort by streamlining my sampling procedure.  Rather than taking a simple random sample of 30 students from the whole school I first choose a grade at random and then randomly select 30 students from this grade.  This is an attractive procedure because now I don’t have to traipse around the whole school measuring only one or two students from each class.  Now I may be able to draw my sample from just one or two classrooms. This procedure is even unbiased, i.e., I get the right answer on average.

But the streamlined procedure produces much more variation than the original one does.  If, at the first stage, I happen to select the 8th grade then my estimate for the school’s average height will be much higher than the actual average height.  If, on the other hand, I select the 1st grade then my estimate will be much lower than the actual average. These two outcomes balance each other out (the unbiasedness property).  But the variation in the estimates across repeated samples will be much higher under the streamlined procedure than it will be under the original one.  A proper confidence interval for the streamlined procedure will need to be wider than a proper confidence interval for the original procedure will be.

Analogously, the confidence intervals of Roberts et al. need to account for their first-stage randomization over governorates.  Since they don’t do this all their confidence intervals are too narrow.

Unfortunately, this problem is thornier than it may appear to be at first glance.  The only way to correct the error is to incorporate information about what would have happened in the excluded governorates if they had actually been selected.  But since these governorates were not selected the survey itself supplies no useful information to fill this gap.  We could potentially address this issue by importing information from outside the system but I won’t do this today.  So I, like Roberts et al., will just ignore this problem which means that my confidence intervals, like theirs, will be too narrow.

OK, enough with the caveats.  I just need to make one more observation and we’re really to roll.

Buried deep within the paper there is an assumption that the 33 clusters are “exchangeable”. This technical term is actually crucial.  In essence, it means that each cluster can potentially represent any area of Iraq. So if, for example, there was a cluster in Missan with 2 violent deaths then if we resample we can easily find a cluster in Diala just like it, in particular having 2 violent deaths.   Of course, this exchangeability assumption implies that there is nothing special about the fact that the cluster with 52 violent deaths turned out to be in Fallujah.  Exchangeability implies that if we sample again we might well find a cluster with 52 deaths in Baghdad or Sulaymaniya.  Exchangeability seems pretty far fetched when we think in terms of the Fallujah cluster but if we leave this aside the assumption is strong but, perhaps, not out of line with many assumptions researchers tend to make in statistical work.

We can now implement an easy computer procedure to calculate confidence intervals:

  1. Select 1,000 samples, each one containing 33 clusters.  These samples of 33 clusters are chosen at random (with replacement) from the list of 33 clusters given above (18 0’s, 7 1’s, 7 2’s and 1 52).  Thus, an individual sample can turn out to have 33 0’s or 33 52’s although both of these outcomes are very unlikely (particularly the second one.)
  2. Estimate the number of violent deaths for each of the 1,000 samples.  As I noted in a previous post we can do this in a rough and ready way by multiplying the total number of deaths in the sample by 3,000.
  3. Order these 1,000 estimates from smallest to largest.
  4. The lower bound of the 95% confidence interval is the estimate in position 25 on the list.  The upper bound is the estimate in position 975.  The central estimate is the estimate at position 500.

Following these procedures I get a confidence interval of 40,000 to 550,000 with a central estimate of 220,000.  (I’ve rounded all numbers to the nearest 10,000 as it seems ridiculous to have more precision than that.)  Notice that these numbers are slightly different from the ones at the top of the post because I took 1,000 samples this time and only 100 last time.  So these numbers supercede the earlier ones.

We can do the same thing without the Fallujah cluster.  Now we take samples of 32 from a list with 18 0’s, 7 1’s and 7 2’s.  This time I get a central estimate of 60,000 violent deaths with a 95% confidence interval of 40,000 to 90,000.

Next I briefly address caveat 2 above by reallocating the 21 violent deaths that are spread over 14 clusters in an indeterminate way.  Suppose now that we have 13 clusters with one violent death and 1 cluster with 8 violent deaths.  Now the estimate that includes the Fallujah cluster becomes 210,000 with a confidence interval of 30,000 to 550,000.  Without Fallujah I get an estimate of 60,000 with a range of 30,000 to 120,000.

Caveats 1 and 3 mean that these intervals should be stretched further by an unknown amount.

Here are some general conclusions.

  1. The central estimate for the number of violent deaths depends hugely on whether Fallujah is in or out.  This is no surprise.
  2. The bottoms of the confidence intervals do not depend very much on whether Fallujah is in or out.  This may be surprising at first glance but not upon reflection.  The sampling simulations that include Fallujah have just a 1/33 chance of picking Fallujah at each chance.  Many of these simulations will not chose Fallujah in any of their 33 tries.  These will be the low-end estimates.  So at the low end it is almost as if Fallujah never happened.  These sampling outcomes correspond with reality.  In three subsequent surveys of Iraq nothing like that Fallujah cluster ever appeared again. It really seems to have been an anomaly.
  3. The high-end estimates are massively higher when Fallujah is included than they are when it isn’t.  Again, this makes sense since some of the simulations will pick the Fallujah cluster two or three times.

 

Mismeasuring War Deaths in Iraq: The Partial Striptease

I now continue the discussion of the Roberts et al. paper that I started in my series on the Chilcot Report.  This is tangent from Chilcot so I’ll hold this post and its follow-ups outside of that series.

Les Roberts never released a proper data set for his survey.  Worse, the authors are sketchy on important details in the paper, leaving us to guess on some key issues.  For example, in his report on Roberts et al. to the UK government Bill Kirkup wrote:

The authors provide a reasonable amount of detail on their figures in most of the paper.  They do, however, become noticeably reticent when it comes to the breakdown of deaths into violent and non-violent, and the breakdown of violent deaths into those attributed to the coalition and those due to terrorism or criminal acts, particularly taking into account the ‘Fallujah problem’…

Roberts et al. claim that “air strikes from coalition forces accounted for most violent deaths” but Kirkup points out that without the dubious Fallujah cluster it’s possible that the coalition accounted for less than half of the survey’s violent deaths.

Kirkup’s suspicion turns out to be correct.

However, you need to look at this email from Les Roberts to a blog to settle the issue.  It turns out that coalition air strikes outside Fallujah account for 6 out of 21 violent deaths there with 4 further deaths attributed to the coalition using other weapons.

My primary point here is about data openness rather than about coalition air strikes.  Roberts et al. should just show their data rather than dribbling it out in bibs and bobs into the blogosphere.

Roberts gives another little top up here.  (I give that link only to document my source.  I recommend against ploughing through this Gish Gallop by Les Roberts.)  Buried deep inside a lot of nonsense Roberts writes:

The Lancet estimate [i.e. Roberts et al.], for example, assumes that no violent deaths have occurred in Anbar Province; that it is fair to subtract out the pre-invasion violence rate; and that the 5 deaths in our data induced by a US military vehicles are not “violent deaths.”

Hmmm…..5 deaths caused by US military vehicles.

Recall that each death in the sample yields around 3,000 estimated deaths. This translates into 15,000 estimated deaths caused by US military vehicles – nearly 30 per day for a year and a half.  There have, unfortunately, been a number of Iraqis killed by US military vehicles.  Iraq Body Count (IBC) has 110 such deaths in its database during the period covered by the Roberts et al. survey.  I’m sure that IBC hasn’t captured all deaths in vehicle accidents but, nevertheless, the 15,000 figure looks pretty crazy.

Again I come back to my main point – please just give us a proper dataset rather than a partial striptease.  Meanwhile, I can’t help thinking Roberts et al. are holding back on the data because it contains more embarrassments that we don’t yet know about.

PS – After providing the above quote I feel obligated to debunk it further.

  1. Roberts writes that his estimate omits deaths in Anbar Province (which contains Fallujah).  But many claims in his paper are only true if you include Anbar (Fallujah).  Indeed, this very blog post opened with one such claim.  We see that Fallujah is in for the purpose of saying that most violent deaths were caused by coalition airstrikes but Fallujah is out when it’s time to talk about how conservative the estimate is because it omits Fallujah.  Call this the “Fallujah Shell Game”.  (See the comments of Josh Dougherty here.)Shell Game_Thimblerig small
  2. Roberts suggests that he bent over backwards to be fair by omitting pre-invasion violent deaths from his estimate.  But, first of all, there was only one such death so it hardly makes a difference whether this one is in our out.  Second, it’s hard to understand what the case would be for blaming a pre-invasion death on the invasion.  .