The Perils and Pitfalls of Matching War Deaths Across Lists: Part 1

I argued in an earlier post that matching deaths across lists is a nontrivial exercise that involves a lot of judgement and that, therefore, needs to be done transparently.  Here is the promised follow up post which I do jointly with Josh Dougherty of Iraq Body Count.  In fact, we’ll make this into another multi-part series as there are many different sources and issues to explore. This is a large subject of growing importance to the conflict field, so we may also eventually convert some of this material into a journal article. Throughout this series we’ll draw heavily on Josh’s extensive experience matching violent deaths across sources for the Iraq conflict.

Today we’ll set the table with some preliminaries and offer basic findings, with more detailed exploration of the data to follow in future posts.

First, list matching for Iraq has involved a combination of event matching and victim matching.  Events are usually considered to be discrete violent incidents, such as suicide bombings, air attacks or targeted assassinations, and are typically defined by their location, date, size, type and other features.

The event matching aspect of the Iraq work means that it won’t always be directly relevant to pure victim-based matching efforts such as those underpinning the statistical work of Peru’s Truth and Reconciliation Commission (TRC), or the various efforts involving casualty lists covering the war in Syria.  We’ll talk more about pure victim-based matching in a future post.  However, matching events is ultimately still about matching deaths/victims, so the issues that arise are very similar and most of what we write here will be relevant to victim-based matching.

Second, we analyse a matching exercise from this paper by Carpenter, Fuller and Roberts (CFR) that attempts to match events from the Iraq war across two sources.  This CFR paper has been cited in some major journal articles.  In fact, Megan Price and Patrick Ball, the main author of the statistical report of the Peruvian TRC, relied heavily on CFR’s matching in some of their own papers. Yet CFR’s matching turns out to be very bad.

Third, we won’t address here the main matching exercise of 2,500 records carried out (again badly) in the CFR paper.  We cover, rather, a robustness check matching smaller samples that CFR present towards the end of their paper, and which should be more easily digestible for readers.  A proper analysis of CFR’s main matching exercise is beyond the scope of this series, but we can say here that the kind of problems affecting the robustness check generally carry over into the main matching exercise. Note, however, that CFR’s main matching is done by hand with human researchers, whereas the robustness check that we cover below is described as “computer-driven” and “non-subjective”. Still, both the human and computer approaches use essentially an algorithmic matching approach with very similar pre-determined parameters. The major difference is that in one case an algorithm is applied by hand, with more room for human judgment, while in the other it is apparently applied more strictly with the help of a machine. Indeed, CFR report that the two approaches “resulted in the same conclusions,” so they suggest that their robustness check has succeeded and that we should feel more confident in their findings.

In this exercise, CFR match samples from two sources covering events that occurred in Karbala, Iraq, between 2004 and 2009.  The sources are Iraq Body Count (IBC) and the Iraq War Logs published by WikiLeaks in 2010, also known as the official SIGACTs database of the US military.  Here are the methods of IBC.

Unfortunately, we know of no formal statement of a data collection methodology for SIGACTs, however we do know that it is compiled by the US Department of Defence from the field reports of US and Coalition soldiers, Iraqi security forces and other Iraqi sources.  We can also learn about SIGACTs by inspecting the entries.  This one, for example, describes a “search and attack” operation in which Coalition Forces killed seven “Enemy” fighters in the Diyala governorate.  The entry displays SIGACTs’ standard data-entry fields which include the date, time, GPS coordinates, event type, reporting unit and numbers killed and wounded. The casualty numbers are further divided into “Enemy”, “Friendly”, “Civilian” and “Host Nation” categories. Each record begins with a short headline and also contains a longer text description of the events. These descriptions tend to be rather jargon-filled but can be read fluently after some practice.

We will show in the next post of this series that careful reading of the detailed text descriptions is essential for matching SIGACTs-recorded deaths against other sources correctly. The CFR work runs aground already at this data inspection stage because they worked only with a summary version of the data, published by The Guardian, which omits the detailed text descriptions. Note also that the above-cited Price and Ball paper, which closely follows the CFR lead, shares CFR’s cavalier approach to the SIGACTs data, writing incorrectly of its methodology:

SIGACTSs based on daily “Significant Activity Reports” which include “…known attacks on Coalition forces, Iraqi Security Forces, the civilian population, and infrastructure. It does not include criminal activity, nor does it include attacks initiated by Coalition or Iraqi Security Forces”

This is not true of the full SIGACTs database released in 2010, and instead comes third hand from a globalsecurity.org description of some statistics on “Enemy-initiated attacks” that appeared in a 2008 US DoD report. Those data were derived from only selected portions of the SIGACTs database and their description does not apply to the full dataset. A cursory glance at the full SIGACTs dataset would have quickly revealed that it includes criminal activity and attacks initiated by Coalition or Iraqi Security Forces.

Further background on the SIGACTs (Iraq War Logs) data can be found here and here.

CFR derives their Karbala sample, plus a separate Irbil one to which we will return later, by:

filtering the entire WL data set in the event description for the appearance of the words ‘‘Irbil’’ and ‘‘Karbala.’’

You should interpret “the entire WL data set” to mean the entire Civilian category, with at least 1 death, of the Guardian version of the SIGACTs dataset, i.e., the version that omits the detailed text descriptions of each record.  In this context, the above phrase “event description” can only refer to the headline of each record, as there is nothing else in the Guardian version of the dataset that could both approximate an “event description” and contain the word “Karbala”.

The above filtering yields a sample of 50 records containing 558 deaths.  However, strangely, CFR report only 39 records in their results table.  It would seem that CFR had an additional, unreported, filtering stage that eliminated 11 records.  Or perhaps CFR simply made a mistake.  There is no way to know at present how or why this happened because CFR do not list their 39 Karbala records or their matching interpretations for each in their paper, and have ignored or refused past data requests.  Consequently, we will simply follow CFR’s reported sampling methodology, as it appears in their paper, and proceed with matching the 50 records it produces.

CFR’s reported matching algorithm applied to this sample contains three matching requirements:

  1. Event dates must be within one calendar day of each other.
  2. The number killed cannot be more than + or – 30% apart.
  3. Weapon types must match.

CFR report one main finding on Karbala alone (again, we will return to Irbil later):

the majority of events in WL [SIGACTs] are not in IBC and vice-versa.

Indeed, CFR’s results table claims that only 1 of their 39 SIGACT records match IBC on all three of their criteria. [Note that the first version of this post said that there were 2 matches rather than the correct number which is 1] They report only event, not death, statistics, but there is an obvious implication that IBC missed a high percentage of the deaths in the Karbala sample.

The problem is that their results are very wrong. When we compare each of the records in detail, the majority of records and the vast majority of deaths in the Karbala sample match with IBC. Specifically, 95% of deaths (533 out of 558) and 66% of records (33 out of 50) match with the IBC database.

However, when we apply CFR’s matching algorithm to those same records, only 24% of deaths (132 of 558) and 22% of records (11 of 50) match on all three criteria. We should note here that applying CFR’s algorithm is not as simple or straightforward as it might seem. Their three requirements all raise some ambiguities that need to be resolved by subjective judgement in practice, and the outcomes of these choices can move the final numbers around a bit.  We will discuss these issues in our next post, but any resolution of these ambiguities will still leave an enormous distance between CFR results and the truth.

It should be stressed that the CFR approach apparently seemed reasonable and reliable to the authors, journal referees and editors, and to other researchers, like Price and Ball, who build on CFR’s work. Yet their approach ultimately gets the data all wrong, and for reasons that become pretty clear when one examines the data in detail. Indeed, we find that CFR’s conclusions reflect defects in their methodology far more than they reflect holes in IBC’s coverage of conflict deaths in Karbala.

With this in mind, let’s circle back to the Peru debate which inspired the present series on matching. In the Peru discussion Daniel Marique Vallier and Patrick Ball (MVB) argue that some of Silvio Rendon’s point estimates for numbers of people killed in the Peruvian war are “impossible” because these point estimates are below numbers obtained by merging and deduplicating deaths appearing on multiple lists. But the results we report here should shock anyone who previously thought that counts emerging from such list mergers can simply be taken at face value and treated uncritically as absolute minima. MVB’s matching is unlikely to be anywhere near as bad as CFR’s, but we still need to see the matching details before we can begin to talk seriously about minima.

Our next post will share the Karbala sample along with our case-by-case matching interpretations and dig into the details of how and why the CFR approach got things so wrong.

3 thoughts on “The Perils and Pitfalls of Matching War Deaths Across Lists: Part 1

Leave a comment