Secret Data Sunday – Iraq Family Health Survey

The WHO-sponsored Iraq Family Health Survey (IFHS) led to a nice publication in the New England Journal of Medicine that came complete with an editorial puff piece extolling its virtues.  According to the NEJM website this publication has generated 60 citations and we’re still counting.   If you cast a net wider than just medical publications then the  citation count must run well into the hundreds.

But the IFHS virtues don’t stop there.  The NEJM paper, and the accompanying report, are well written and supply plenty of good methodological information about the survey.  The authors are pretty up front about the limitations of their work, notably that they had to skip interviews in some areas due to security concerns.  Moreover, the IFHS is an important survey not least because its estimate of 150,000 violent deaths discredited the Burnham et al. estimate of 600,000 violent deaths for almost exactly the same time period.  (The Burnham et al. survey hid its methodology and was afflicted by serious ethical and data integrity problems. )

I have cited the IFHS multiple times in my own work and generally believe in it.  At the same time, the IFHS people did several questionable things with their analysis that I would like to correct, or at least investigate, by reanalyzing the IFHS data.

But here’s the rub.  The WHO has not released the IFHS dataset.

I and other people have requested it many times.  The field work was conducted way back in 2006.  So what is the WHO waiting on?

I’ll leave a description of my unrealized reanalysis to a future post. This is because my plans just don’t matter for the issue at hand; the IFHS data should be in the public domain whether or not I have a good plan for analyzing them.  (See this post on how the International Rescue Committee hides its DRC data in which I make the same point.)

There is an interesting link between the IFHS and the Iraq Child and Maternal Mortality Survey, another important dataset that is also unavailable.  The main point of contact for both surveys is Mohamed Ali of the WHO.  Regarding the IFHS. Mohamed seemed to tell me in an email that only the Iraqi government is empowered to release the dataset.  If so, this suggests a new (at least for me) and disturbing problem;

Apparently, the WHO uses public money to sponsor surveys but then sells out the general public by ceding their data distribution rights to local governments, in this case to Iraq.  

This is practice of allowing governments benefiting from UN-sponsored research to withhold data from the public that pays for the research is unacceptable .  It’s great that the WHO sponsors survey research in needy countries but open data should be a precondition for this service.



How Many People were Killed in the Libyan Conflict – Some field work that raises more questions than it answers

Hana Salama asked me for an opinion on this article. I had missed it but it is, potentially, interesting to me so I am happy to oblige her.

I’ve now absorbed it but find myself even more puzzled than I was after reading that Syria survey I blogged on a few weeks back.  Again, it looks like some people did some useful field work but the write up is so bad that it’s hard to know exactly what they did.  In fact, the Libya work is more opaque than the Syria work to the point where I wonder what, if anything, was actually done.

For orientation here is the core of the abstract:


A systematic cross-sectional field survey and non-structured search was carried out over fourteen provinces in six Libyan regions, representing the primary sites of the armed conflict between February 2011 and February 2012. Thirty-five percent of the total area of Libya and 62.4% of the Libyan population were involved in the study. The mortality and injury rates were determined and the number of displaced people was calculated during the conflict period.


A total of 21,490 (0.5%) persons were killed, 19,700 (0.47%) injured and 435,000 (10.33%) displaced. The overall mortality rate was found to be 5.1 per 1000 per year (95% CI 4.1–7.4) and injury rate was found to be 4.7 per 1000 per year (95% CI 3.9–7.2) but varied by both region and time, reaching peak rates by July–August 2011.

I’m not sure but I think the researchers (hereafter Daw et. al.) tried to count war deaths (plus injuries and displacement numbers) rather than trying to statistically estimate these numbers.  (See this paper on the distinction.)

Actually, I read the whole paper thinking that Daw et al. drew a random sample and did statistical estimation but then I changed my mind.  I got my initial impression at the beginning because they say

This epidemiological community-based study was guided by previously published studies and guidelines.

They then cite the (horrible) Roberts et al. (2004) Iraq survey as providing a framework for their research (see this and follow the links).   Since Roberts et al. was a sample survey I figured that Daw et al. was also a sample survey.  They then go on to say that

Face to face interviews were carried out with at least one member of each affected family….

This also seemed to point in the direction of a sample survey conducted on a bunch of randomly selected households.  (With this method you pick a bunch of households at random, find out how many people lived and died in each one and then extrapolate a national death rate from the in-sample death data.)

But then I realized that the above quote continues with

…listed in the registry of the Ministry of Housing and Planning

Hmmmm….so they interviewed all affected families listed in the registry of some Ministry.  This registry cannot have been a registry of every family living in the areas covered by the survey because there are far more families there than could have been interviewed on this project.  (The areas covered contain around 4.2 million people according to Table 1 of the paper and  surely Daw et al. did not conduct hundreds of thousands of interviews.)

So I’m guessing that the interviews were just of people from families on an official list of victims; killed, injured or displaced.  This guess places a lot of emphasis on one interpretation of the words “listed” and “affected” but it does make some sense.

To be clear, even interviewing one representative from every affected family would have been a gargantuan task since Daw et al. identify around 40,000 casualties (killings plus injuries) and more than 400,000 displaced people.  So we would still be talking about tens of thousands of interviews.

To be honest, now I’m wondering if all these interviews really happened.  That’s an awful lot of interviews and they would have been conducted in the middle of a war.

So now I’m back to thinking that maybe it was a sample survey of a few thousand households.  But if so then the write up has the large flaw that there is no description whatsoever of how its sample was drawn (if, indeed, there was a sample).

Something is definitely wrong here.  I shouldn’t have to get out a Ouiji board to divine the authors’ methodology.

The Syria survey discussed a few weeks ago seems to be in a different category.  For that one I have a lot of questions about what they did combined with doubts about whether their methods make sense.  But this Libya write-up seems weird to the point where I wonder whether they were actually out in the field at all.

Maybe an email to Dr. Daw will clear things up in a positive way.  With the Syria paper emailing the lead author got me nowhere but maybe here it will work.  I’m afraid that the best case scenario is that Daw et al. did some useful field work that was obscured by a poor write up and that there is a better paper waiting to get written.




The AAPOR Report on 2016 US Election Polling plus some Observations on Survey Measurement of War Deaths – Part 1

I’ve finally absorbed the report of the American Association for Public Opinion Research (AAPOR) on polling in the Trump-Clinton election.  So I’ll jot down my reactions in a series of posts  (see also this earlier post).   In keeping with the spirit of the blog I’ll also offer related thoughts on survey-based approaches to estimating numbers of war deaths.

I strongly recommend the AAPOR report.  It has many good insights and is highly readable.

That said, I’ll mostly criticize it here.

But before I proceed to the substance of the AAPOR report I want to draw your attention to the complete absence of an analogous document in the literature using household surveys to estimate war deaths.

There has been at least one notable success in survey-based war-death estimation and several notable failures.  (two of the biggest are here and here).  Yet there has not been any soul searching within the community of practitioners in the conflict field that can be even remotely compared to the AAPOR document.  On the contrary, there is a sad history of epidemiologists militantly promoting discredited work as best practice.  See, for example, this paper which concludes:

The use of established epidemiological methods is rare. This review illustrates the pressing need to promote sound epidemiologic approaches to determining mortality estimates and to establish guidelines for policy-makers, the media and the public on how to interpret these estimates.

The great triumph that drives the above conclusion is the notorious Burnham et al. (2006) study which overestimated the number of violent deaths in Iraq by at least a factor of 4 while endangering the lives of its interviewees.

Turning back to the AAPOR document, I want to underscore that AAPOR, to its credit, has produced a self-critical report and I’m benefiting here from the nice platform their committee has provided.

The report maintains a strong distinction between national polls and state polls.  Rather unfortunately though, the report sets up state pollsters as the poor cousins of the real national pollsters.

It is a persistent frustration within polling and the larger survey research community that the profession is judged based on how these often under-budgeted state polls perform relative to the election outcome.

Analogously, we might say that Democrats are frustrated by the judgments of the electoral college which keeps handing the presidency over to Republicans despite Democrat victories in popular votes.  Yes, I too am frustrated by this weird tick of the American system.  But the electoral college is the way the US determines its presidency and we all have to accept this.   And just as it would be a mistake for Democrats to focus on winning the popular vote while downplaying the electoral college, it’s also a mistake for pollsters to focus on predicting the popular vote while leaving electoral college prediction as an afterthought.

The above quote is followed by something that is also pretty interesting:

The industry cannot realistically change how it is judged, but it can make an improvement to the polling landscape, at least in theory. AAPOR does not have the resources to finance a series of high quality state-level polls in presidential
elections, but it might consider attempting to organize financing for such an effort. Errors in state polls like those observed in 2016 are not uncommon. With shrinking budgets at news outlets to finance polling, there is no reason to believe that this problem is going to fix itself. Collectively, well-resourced survey organizations might have enough common interest in financing some high quality state-level polls so as to reduce the likelihood of another black eye for the profession.

I have to think more about this but at first glance this thinking seems sort of like saying:

Look, for a while we’ve been down here in Ecuador selling space heaters and, realistically, that’s not gonna change (although we’re writing this report because our business is faltering).  But maybe next year space heater companies can donate  a few air conditioners to some needy people.  It’s naive to imagine that there will be any money in the air conditioner business in Ecuador but this charity might help us defend ourselves against the frustrating criticism that air conditioner companies are supplying a crappy product.

In other words, it’s clear that a key missing ingredient for better election prediction is more high-quality state polls.  So why is it obvious that the market will not reward more good state polls but it will reward less relevant national ones?

(Side note – I think there are high-quality state polls and I think that the AAPOR committee agrees with me on this.  It’s just that there aren’t enough good state polls and also the average quality level may be lower on state polls than it is on national ones.)

Maybe I’m missing something here.  Is there some good reason why news consumers will always want more national polls even though these are less informative than state polls are?


But maybe journalists should just do better job of educating their audiences.  A media company could stress that presidential elections are decided state by state, not at the national level, and so this election season they will do their polling state by state, thereby providing a better product than that of their competitors who are only doing national polls.

In short, there should be a way to sell high quality information and I hope that the polling industry innovates to tailor their products more closely to market needs than they have done in recent years.


I’ve Done Something or Other and Say that 470,000 People were Killed in Syria – Would you Like to Interview Me?

Let’s go back to February of 2016 when the New York Times ran this headline:

Death Toll from War in Syria now 470,000, Group Finds

The headline is more conservative than a caption in the same article which reads:

At least [my emphasis] 470,000 Syrians have died as a result of the war, according to the Syrian Center for Policy Research.

This switch between the headline and the caption is consistent with a common pattern of converting an estimate, that might be either too high or too low, into a bare minimum.

Other respected outlets such as PBS, and Time jumped onto the 470,000 bandwagon with the Guardian claiming primacy in this story with an early exclusive that quotes the report’s author:

“We use very rigorous research methods and we are sure of this figure,” Rabie Nasser, the report’s author, told the Guardian. “Indirect deaths will be greater in the future, though most NGOs [non-governmental organisations] and the UN ignore them.

“We think that the UN documentation and informal estimation underestimated the casualties due to lack of access to information during the crisis,” he said.

Oddly, none of the news articles say anything about what this rigorous methodology is.  The Guardian refers to “counting” which I would normally interpret as saying that the Syrian Center for Policy Research (SCPR) has a list of 470,000 people killed but it is not at all clear that they really have such a list.

This report was the source for all the media attention.  The figure of 470,000 appears just once in the report, in a throwaway line in the conclusion:

 The armed conflict badly harmed human development in Syria where the fatalities in 2015 reached about 470,000 deaths, the life expectancy at birth estimated at 55.4 years, and the school age non-attendance rate projected at 45.2 per cent; consequently, the HDI of Syria is estimated to have lost 29.8 per cent of its HDI value in 2015 compared to 2010.

The only bit of the report that so much as hints at where the 470,00 number came from is this:

The report used results and methodology from a forthcoming SCPR report on the human development in Syria that is based on a comprehensive survey conducted in the mid of 2014 and covered all regions in Syria. The survey divided Syria into 698 studied regions and questionnaire three key informants, with specific criteria that guarantee inclusiveness and transparency, from each region. Moreover, the survey applied a strict system of monitoring and reviewing to ensure the correctness of responses. About 300 researchers, experts, and programmers participated in this survey.

This is nothing.

The hunger for scraps of information on the number of people killed in Syria is, apparently, so great that it is feasible to launch a bunch of news headlines just by saying you’ve looked into this question and come up with a number that is larger than what was previously thought.  (I strongly suspect that having a bigger number which you use to dump on any smaller numbers is a key part of getting noticed.)

That said, the above quote does promise a new report with more details and eventually a new report was released – but the details in the new report on methodology are still woefully inadequate.  They divide Syria up, interview three key informants in each area and then, somehow, calculate the number of dead people based on these interviews.  I have no idea what this calculation looks like.  There is a bit of description on how SCPR picked their key informants but, beyond that, the new report provides virtually no information relevant for evaluating the 470,000 figure.  The SCPR doesn’t even provide a copy of their questionnaire and I can hardly even guess at what it looks like.

One thing is clear though – they did not use the standard sample survey method for estimating the number of violent deaths.  Under this approach you pick a bunch of households at random, do interviews on the number of people who have lived and died in each one and extrapolate a national death rate based on death rates observed in your sample households.  If the SCPR had done something like this then at least I would’ve had a sense of where the 470,000 number came from, although I’d still want to know details.

I emailed Rabie Nasser asking for details but didn’t hear back.  Who knows.  Maybe my message went into his spam folder.  There are other people associated with this work and I’ll try to contact them and will report back if I hear something interesting.

I want to be clear.  I’m not saying that this work is useless for estimating the number of people killed in the Syrian war.  In fact, I suspect that the SCPR generated some really useful information on this question and on other issues as well.  But until they explain what they actually did I would just disregard the work, particularly the 470,000 figure.  I’m not saying that I think this number is too high or that it is too low.  I just think that it is floating in thin air without any methodological moorings to enable us to understand it.

Journalists should lay off press releases taking the form of “I did some unspecified research and here are my conclusions.”


New Paper on Accounting for Civilian War Casualties

Hello everybody.

The radio silence was much longer than intended but blog posts should start coming fast and furious now.  I’ve got a lot I want to get off my chest as soon as possible.

Let’s get the ball rolling with a new paper I have with Nicholas Jewell and Britta Jewell.  (Well, to be honest, it isn’t really a brand new paper but it’s newly accepted at a journal and we’re now putting it into the public domain.)

I dare say that this paper is a very readable introduction to civilian casualty recording and estimation, that is, to most of the subject matter of the blog.  I hope you will all have a look.

And, please, send in your comments..

More soon…..

PS – Here is an alternative link to the paper in case the first one doesn’t work for you.


Mismeasuring Deaths in Iraq: Addendum on Confidence Interval Calculations


In my last post I used a combination of bootstrapping and educated guesswork to find  confidence intervals for violent deaths in Iraq based on the data from the Roberts et al. survey.  (The need for guesswork arose because the authors have not been forthcoming with their data.)

Right after this went up a reader contacted me and asked whether the bottom of one of these confidence intervals can go below 0.

The short answer is “no” with the bootstrap method.  This technique can only take us down to 0 and no further.


With bootstrapping we randomly select from a list of 33 clusters.  Of course, none of these clusters experienced a negative number of violent deaths. So 0 is the smallest possible count we can get for violent deaths in any simulation sample.  (In truth, the possibility of pulling 33 0’s is more theoretical than real.  This didn’t happen in any of my 1,000 draws of 33.)

Nevertheless, it turns out that if we employ the most common methods for calculating confidence intervals (not bootstrapping) then the bottom of the interval does dip below 0 when the dubious Fallujah cluster is included.

Here’s a step by step walk-through of the traditional method applied to the Roberts et al. data.  (I will assume that violent deaths are allocated across the 33 clusters as 18 0’s, 7 1’s, 7 2’s and 1 52.)

  1. Compute the mean number of violent deaths per cluster.  This is 2.2.  An indication that something is screwy here is the fact that the mean is bigger than the number of violent deaths in 32 out of the 33 clusters.  At the same time the mean is way below the number of violent deaths in the Fallujah cluster (52).  Note that without the Fallujah cluster the mean becomes 0.7, i.e., eliminating Fallujah cuts the mean by more than a factor of 3.
  2. Compute the sample standard deviation which is a measure of how strongly the number of violent deaths varies by cluster.  This is 9.0.  Note that if we eliminate the Fallujah cluster then the sample standard deviation plummets by more than a factor of 10, all the way down to 0.8.  This is just a quantitative expression of the obvious fact that the data are highly variable with Fallujah in there.  Note further that the big outlier observation affects the standard deviation more than it affects the mean.
  3. Adjust for sample size.  We do this by dividing the sample standard deviation by the square root of the sample size.  This gives us 1.6.  Here the idea is that you can tame the variation in the data by taking a large sample.  The larger the sample size the more you tame the data.  However, as we shall see, the Fallujah cluster makes it impossible to really tame the data with a sample of only 33 clusters.
  4. Unfortunately, the last step is mysterious unless you’ve put a fair amount of effort into studying statistics.  (This, alone, is a great reason to prefer bootstrapping which is very intuitive.)  Our 95% confidence interval for the mean number of violent deaths per cluster is, approximately, the average plus or minus 2 times 1.6, i.e., -1.0 to 5.4.  There’s the negative lower bound!
  5. We can translate from violent deaths per cluster to estimated violent deaths by multiplying by 33 and again by 3,000.  We end up with -100,000 to 530,000.  (I’ve been rounding at each step.  If, instead I don’t round until the very end I get -90,000 to 530,000….this doesn’t really matter.)  Note that without Fallujah we get a confidence interval of 30,000 to 90,000 which is about what we got with bootstrapping.

Have we learned anything here other than that I respond to reader questions?

I don’t think we’ve learned much, if anything, about violent deaths in Iraq.  We already knew that the Roberts et al. data, especially the Fallujah observation, is questionable and maybe the above calculation reinforces this view a little bit.

But, mostly, we learn something about the standard method for calculating confidence intervals; when the data are wild this method can give incredible answers.  Of course, a negative number of violent deaths is not credible.

There is an intuitive reason why the standard method fails with the Roberts et al. data; it forces a symmetric estimate onto highly asymmetric data.  Remember we get 2.2 plus or minus 3.2 average violent deaths per cluster.  The plus or minus means that the confidence interval is symmetric.  The Fallujah observation forces a wide confidence interval which has to go just as wide on the down side as it is on the up side.  In some sense the method is saying that if it’s possible to find a cluster with 52 violent deaths then it also must be possible to find a cluster with around -52 violent deaths.  But, of course, no area of Iraq  experienced -52 violent deaths.  So you wind up with garbage.

Part of the story is also the small sample size. With twice as many cluster, but the same sort of data, the lower limit would only go down to about 0.

It’s tempting to just say “garbage in, garbage out” and, up to a point, this is accurate.   But the bigger problem is that the usual method for calculating confidence intervals is not appropriate in this case.

Mismeasuring War Deaths in Iraq: Confidence Interval Calculations

We return again to the Roberts et al. paper.

In part 5 of my postings on the Chilcot Report I promised to discuss the calculations of confidence intervals underlying these claims:

One standard calculation method (bootstrapping) leads to a central estimate of 210,000 violent deaths with a 95% confidence interval of around 40,000 to 600,000.  However, if you remove the Fallujah cluster the central estimate plummets to 60,000 with a 95% confidence interval of 40,000 to 80,000.  (I’ll give details on these calculations in a follow-up post.)

I have to start with some caveats.

Caveat 1:  No household data – failure to account for this makes confidence intervals too narrow

As we know the authors of the paper have not released a proper dataset.  To do this right I would need to have violent deaths by household but the authors are holding this information back.  Thus, I have to operate at the cluster level.  This shortcut suppresses household-level variation which, in turn, constricts the widths of the confidence intervals I calculate.  It’s possible to get a handle on the sizes of these  effects but I won’t go there in this blog post.

Caveat 2: Violent deaths are not broken down by cluster – confidence intervals depend on how I resolve this ambiguity

Roberts et al. don’t provide us with all the information we need to proceed optimally at the cluster level either since they don’t tell us the number of violent deaths in each of their 33 clusters.  All they say in the paper (unless I’ve missed something) is that the Fallujah cluster had 52 violent deaths and the other 32 clusters combined had 21 violent deaths spread across 14 clusters.  I believe this is the best you can do although maybe a clever reader can mine the partial striptease to extract a few more scraps of information on how the 21 non-Fallujah violent deaths are allocated across clusters.

This ambiguity leaves many possibilities.  Maybe 13 clusters had one violent death and one cluster had the remaining eight.  Or maybe ten cluster had one death, three clusters had two deaths and the last cluster had 5 violent deaths.  Etc.

To keep things simple I’ll consider just four scenarios.  This first is that there are 18 clusters with 0 deaths, 7 clusters with 1 death, 7 clusters with 2 deaths and the Fallujah cluster with 52 deaths.  The second is that there are 18 clusters with 0 deaths, 13 clusters with 1 death, 1 cluster with 8 deaths and the Fallujah cluster with 52 deaths. The third and fourth scenarios are the same as the first and second except that the latter toss out the Fallujah clusters.

Caveat 3: There is a first stage to the sampling procedures that tosses out 6 governorates – failure to account for this makes the confidence intervals too narrow.  (I already alluded to this issue in this post.)

I quote from the paper:

During September, 2004, many roads were not under the control of the Government of Iraq or coalition forces. Local police checkpoints were perceived by team members as target identification screens for rebel groups.  To lessen risk to investigators, we sought to minimise travel distance and the number of Governorates to visit, while still sampling from all regions of the country. We did this by clumping pairs of Governorates. Pairs were adjacent Governorates that the Iraqi study team members believed to have had similar levels of violence and economic status during the preceding 3 years.

Roberts et al. randomly selected one governorate from each pair, visited only the selected governorates and ignored the non-selected ones.  So, for example, Karbala and Najaf were a pair.  In the event Karbala was selected and the field teams never visited Najaf.  In this way Dehuk, Arbil, Tamin, Najaf, Qadisiyah and Basrah were all eliminated.

This is not OK.

The problem is that the random selection of 6 governorates out of 12 introduces variation into the measurement system that should be, but isn’t, built into the confidence intervals calculated by Roberts et al.  This problem makes all the confidence intervals in the paper too narrow.

It’s worth understanding this point well so I offer an example.

Suppose I want to know the average height of students in a primary school consisting of  grades 1 through 8.  I get my estimate by taking a random sample of 30 students and averaging their heights.  If I repeat the exercise by taking another sample of 30 I’ll get a different estimate of average student height.  Any confidence interval for this sampling procedure will be based on modelling how these 30-student averages vary across different random samples.

Now suppose that I decide to save effort by streamlining my sampling procedure.  Rather than taking a simple random sample of 30 students from the whole school I first choose a grade at random and then randomly select 30 students from this grade.  This is an attractive procedure because now I don’t have to traipse around the whole school measuring only one or two students from each class.  Now I may be able to draw my sample from just one or two classrooms. This procedure is even unbiased, i.e., I get the right answer on average.

But the streamlined procedure produces much more variation than the original one does.  If, at the first stage, I happen to select the 8th grade then my estimate for the school’s average height will be much higher than the actual average height.  If, on the other hand, I select the 1st grade then my estimate will be much lower than the actual average. These two outcomes balance each other out (the unbiasedness property).  But the variation in the estimates across repeated samples will be much higher under the streamlined procedure than it will be under the original one.  A proper confidence interval for the streamlined procedure will need to be wider than a proper confidence interval for the original procedure will be.

Analogously, the confidence intervals of Roberts et al. need to account for their first-stage randomization over governorates.  Since they don’t do this all their confidence intervals are too narrow.

Unfortunately, this problem is thornier than it may appear to be at first glance.  The only way to correct the error is to incorporate information about what would have happened in the excluded governorates if they had actually been selected.  But since these governorates were not selected the survey itself supplies no useful information to fill this gap.  We could potentially address this issue by importing information from outside the system but I won’t do this today.  So I, like Roberts et al., will just ignore this problem which means that my confidence intervals, like theirs, will be too narrow.

OK, enough with the caveats.  I just need to make one more observation and we’re really to roll.

Buried deep within the paper there is an assumption that the 33 clusters are “exchangeable”. This technical term is actually crucial.  In essence, it means that each cluster can potentially represent any area of Iraq. So if, for example, there was a cluster in Missan with 2 violent deaths then if we resample we can easily find a cluster in Diala just like it, in particular having 2 violent deaths.   Of course, this exchangeability assumption implies that there is nothing special about the fact that the cluster with 52 violent deaths turned out to be in Fallujah.  Exchangeability implies that if we sample again we might well find a cluster with 52 deaths in Baghdad or Sulaymaniya.  Exchangeability seems pretty far fetched when we think in terms of the Fallujah cluster but if we leave this aside the assumption is strong but, perhaps, not out of line with many assumptions researchers tend to make in statistical work.

We can now implement an easy computer procedure to calculate confidence intervals:

  1. Select 1,000 samples, each one containing 33 clusters.  These samples of 33 clusters are chosen at random (with replacement) from the list of 33 clusters given above (18 0’s, 7 1’s, 7 2’s and 1 52).  Thus, an individual sample can turn out to have 33 0’s or 33 52’s although both of these outcomes are very unlikely (particularly the second one.)
  2. Estimate the number of violent deaths for each of the 1,000 samples.  As I noted in a previous post we can do this in a rough and ready way by multiplying the total number of deaths in the sample by 3,000.
  3. Order these 1,000 estimates from smallest to largest.
  4. The lower bound of the 95% confidence interval is the estimate in position 25 on the list.  The upper bound is the estimate in position 975.  The central estimate is the estimate at position 500.

Following these procedures I get a confidence interval of 40,000 to 550,000 with a central estimate of 220,000.  (I’ve rounded all numbers to the nearest 10,000 as it seems ridiculous to have more precision than that.)  Notice that these numbers are slightly different from the ones at the top of the post because I took 1,000 samples this time and only 100 last time.  So these numbers supercede the earlier ones.

We can do the same thing without the Fallujah cluster.  Now we take samples of 32 from a list with 18 0’s, 7 1’s and 7 2’s.  This time I get a central estimate of 60,000 violent deaths with a 95% confidence interval of 40,000 to 90,000.

Next I briefly address caveat 2 above by reallocating the 21 violent deaths that are spread over 14 clusters in an indeterminate way.  Suppose now that we have 13 clusters with one violent death and 1 cluster with 8 violent deaths.  Now the estimate that includes the Fallujah cluster becomes 210,000 with a confidence interval of 30,000 to 550,000.  Without Fallujah I get an estimate of 60,000 with a range of 30,000 to 120,000.

Caveats 1 and 3 mean that these intervals should be stretched further by an unknown amount.

Here are some general conclusions.

  1. The central estimate for the number of violent deaths depends hugely on whether Fallujah is in or out.  This is no surprise.
  2. The bottoms of the confidence intervals do not depend very much on whether Fallujah is in or out.  This may be surprising at first glance but not upon reflection.  The sampling simulations that include Fallujah have just a 1/33 chance of picking Fallujah at each chance.  Many of these simulations will not chose Fallujah in any of their 33 tries.  These will be the low-end estimates.  So at the low end it is almost as if Fallujah never happened.  These sampling outcomes correspond with reality.  In three subsequent surveys of Iraq nothing like that Fallujah cluster ever appeared again. It really seems to have been an anomaly.
  3. The high-end estimates are massively higher when Fallujah is included than they are when it isn’t.  Again, this makes sense since some of the simulations will pick the Fallujah cluster two or three times.