# Mismeasuring Deaths in Iraq: Addendum on Confidence Interval Calculations

In my last post I used a combination of bootstrapping and educated guesswork to find  confidence intervals for violent deaths in Iraq based on the data from the Roberts et al. survey.  (The need for guesswork arose because the authors have not been forthcoming with their data.)

Right after this went up a reader contacted me and asked whether the bottom of one of these confidence intervals can go below 0.

The short answer is “no” with the bootstrap method.  This technique can only take us down to 0 and no further.

Explanation

With bootstrapping we randomly select from a list of 33 clusters.  Of course, none of these clusters experienced a negative number of violent deaths. So 0 is the smallest possible count we can get for violent deaths in any simulation sample.  (In truth, the possibility of pulling 33 0’s is more theoretical than real.  This didn’t happen in any of my 1,000 draws of 33.)

Nevertheless, it turns out that if we employ the most common methods for calculating confidence intervals (not bootstrapping) then the bottom of the interval does dip below 0 when the dubious Fallujah cluster is included.

Here’s a step by step walk-through of the traditional method applied to the Roberts et al. data.  (I will assume that violent deaths are allocated across the 33 clusters as 18 0’s, 7 1’s, 7 2’s and 1 52.)

1. Compute the mean number of violent deaths per cluster.  This is 2.2.  An indication that something is screwy here is the fact that the mean is bigger than the number of violent deaths in 32 out of the 33 clusters.  At the same time the mean is way below the number of violent deaths in the Fallujah cluster (52).  Note that without the Fallujah cluster the mean becomes 0.7, i.e., eliminating Fallujah cuts the mean by more than a factor of 3.
2. Compute the sample standard deviation which is a measure of how strongly the number of violent deaths varies by cluster.  This is 9.0.  Note that if we eliminate the Fallujah cluster then the sample standard deviation plummets by more than a factor of 10, all the way down to 0.8.  This is just a quantitative expression of the obvious fact that the data are highly variable with Fallujah in there.  Note further that the big outlier observation affects the standard deviation more than it affects the mean.
3. Adjust for sample size.  We do this by dividing the sample standard deviation by the square root of the sample size.  This gives us 1.6.  Here the idea is that you can tame the variation in the data by taking a large sample.  The larger the sample size the more you tame the data.  However, as we shall see, the Fallujah cluster makes it impossible to really tame the data with a sample of only 33 clusters.
4. Unfortunately, the last step is mysterious unless you’ve put a fair amount of effort into studying statistics.  (This, alone, is a great reason to prefer bootstrapping which is very intuitive.)  Our 95% confidence interval for the mean number of violent deaths per cluster is, approximately, the average plus or minus 2 times 1.6, i.e., -1.0 to 5.4.  There’s the negative lower bound!
5. We can translate from violent deaths per cluster to estimated violent deaths by multiplying by 33 and again by 3,000.  We end up with -100,000 to 530,000.  (I’ve been rounding at each step.  If, instead I don’t round until the very end I get -90,000 to 530,000….this doesn’t really matter.)  Note that without Fallujah we get a confidence interval of 30,000 to 90,000 which is about what we got with bootstrapping.

Have we learned anything here other than that I respond to reader questions?

I don’t think we’ve learned much, if anything, about violent deaths in Iraq.  We already knew that the Roberts et al. data, especially the Fallujah observation, is questionable and maybe the above calculation reinforces this view a little bit.

But, mostly, we learn something about the standard method for calculating confidence intervals; when the data are wild this method can give incredible answers.  Of course, a negative number of violent deaths is not credible.

There is an intuitive reason why the standard method fails with the Roberts et al. data; it forces a symmetric estimate onto highly asymmetric data.  Remember we get 2.2 plus or minus 3.2 average violent deaths per cluster.  The plus or minus means that the confidence interval is symmetric.  The Fallujah observation forces a wide confidence interval which has to go just as wide on the down side as it is on the up side.  In some sense the method is saying that if it’s possible to find a cluster with 52 violent deaths then it also must be possible to find a cluster with around -52 violent deaths.  But, of course, no area of Iraq  experienced -52 violent deaths.  So you wind up with garbage.

Part of the story is also the small sample size. With twice as many cluster, but the same sort of data, the lower limit would only go down to about 0.

It’s tempting to just say “garbage in, garbage out” and, up to a point, this is accurate.   But the bigger problem is that the usual method for calculating confidence intervals is not appropriate in this case.