The Perils and Pitfalls of Matching War Deaths Across Lists: Part 3

We now return, after a long hiatus, to our discussion of matching violent events across datasets.  Before diving into the details we want to remind readers about what we’re doing here and why it’s important.

  1. The immediate context is a discussion of major flaws in this paper by Carpenter, Fuller and Roberts (CFR) that claims very low overlap between violent events in the Iraq War recorded in the database of Iraq Body Count (IBC) and violent events recorded in the SIGACTs database maintained by the US military.  In this post, this post and the present one we have focused on the CFR claim that only a single IBC record can be matched against SIGACTs Karbala records on all three of CFR’s matching criteria.  However, we show that this conclusion is based on a whole slew of errors.  Eliminate the errors and 2/3 of the 50 Karbala SIGACT records (of which CFR used only 37: correction 39) match and 95% of the deaths match.
  2. CFR used a mechanical algorithm to match IBC recorded events against SIGACTs recorded events. A recent paper proposed a computer algorithm to match events across datasets and produce integrated event-based datasets.  Such a system may work tolerably well for matching across two clean datasets with very similar inclusion criteria.  However, the present series illuminates an extreme danger that casual matching by algorithm can easily go haywire.
  3. There have been a number of high profile estimates of war deaths using a method called multiple systems estimation (MSE). Indeed, we had a whole series on the blog analysing MSE estimates that were made for the Peruvian conflict. A critical step in these estimates is to match deaths across datasets to quantify overlap and lack of overlap between them.  This matching is usually of individual deaths, rather than events, but the issues we raise in the present series event matching also apply to death-by-death matching.  They suggest that messy data encountered in the wild will be exceedingly difficult to match without substantial human input.

 

With this background in mind let’s continue zeroing in on the numerous errors that undermine the paper of Carpenter, Fuller and Roberts (CFR) that has often been taken to convey accurate information about the coverage, or lack thereof, by Iraq Body Count (IBC) of violent deaths in the Iraq war.

The last instalment of this series discussed four of eight error types we identified affecting CFR’s matching of deaths in the governorate of Karbala.  Those first four error types all stem from documentation errors in the SIGACTs dataset. Events that are coded with errors in one dataset will tend not to match the same events in a second dataset that is coded correctly.  We showed in Part 2 of this series that this problem rears its head often when matching the SIGACTs and IBC datasets.

We now progress to the next four error types we identified in the CFR matching approach. This time the problems stem not from errors in the documents themselves, but rather from misconceptions of CFR about how the data are structured.

CFR make numerous incorrect assumptions about what matching cases should look like and wrongly interpret any violations of these assumptions as exposing coverage failures in the two datasets.  In some cases these misconceptions amplify the errors already identified in part 2 of this series. Below we number these as errors 5 through 8, continuing from the last instalment that discussed errors 1 through 4.  Throughout we refer to the spreadsheet posted here.

  1. Event size (+/-30% rule) – 8 records (4,5,11,12,18,30,33,36) and 100 deaths – 16% of records and 18% of deaths affected.

CFR require matching on “event size”, i.e., on the number of deaths recorded in each entry of the datasets. For their Karbala exercise they set a tolerance range of + or – 30%. We mentioned in Part 2 of this series that CFR do not specify whether the baseline for this 30% rule applies to the smaller or the larger of the two fatality numbers being compared.  The resolution of this ambiguity will affect some matching conclusions, but even leaving this problem aside, the 30% rule creates plenty of matching problems that bias the CFR results toward under-matching.

CFR’s record 18 from SIGACT is a car bombing that matches IBC record k3003.  However, IBC records a range of 5 – 21 deaths while the SIGACT record lists just 2 deaths. 2 and 5 differ by more than 30%, whichever number is used as the base, so CFR wrongly conclude that both IBC and SIGACT missed each other’s car bombing event, although they really record the same event while disagreeing over the death count. In fact, the Summary text within the SIGACT record, which CFR did not examine since they worked with a truncated version of the SIGACT dataset, actually reports a range of “2-5 X KIA”. However, SIGACT coding rules only allow single numbers, and a 2 was entered for this event. The record’s Summary text also refers to what it describes as “EXCESSIVE REPORTS COMING FROM GOVERNMENT AND MEDIA SOURCES” which would seem consistent with IBC extending its range up to 21 using media sources.  There is no need here to establish the true number of people killed by this car bomb.  Our point is simply that the two datasets contain the same event but reach different conclusions about the number of people killed (although the ranges, once properly understood, overlap).

Ironically, the CFR methodology concludes that IBC is missing deaths because it recorded too many deaths, at least in the view of US military coders.  Similarly, CFR methodology converts SIGACT scepticism about higher reported fatality numbers into missing the event entirely.

In another example, Record 33 illustrates the same problem and has a similar explanation. We showed in Part 2 of this series that Record 33 has a location error, but there is also an event-size problem.  It reports 6 power plant workers killed in an attack near Hwaijah, which is southwest of Kirkuk. IBC record k5980 reports such an attack, but killing 11.  Again, the +/-30% event size requirement of CFR is violated, leading to a false conclusion of no match.  IBC archived numerous accounts of this event giving a range of death counts, for example:

“Gunmen opened fire on a minibus carrying power plant workers in a predominantly Sunni area west of Kirkuk on Wednesday, killing six men, officials said.” [AP 04 Apr]

“Seven employees working at a power plant were killed on Wednesday in an attack near the northern Iraqi city of Kirkuk, a police source said.” [VOI 04 Apr]

“Seven of the workers died instantly while four others were fatally wounded
and died later in hospital, police said.” [REU 04 Apr]

IBC practice is to favour updated reports [Methods section 3.2], hence the last of the three accounts above is favoured due to its coverage of subsequent deaths from injuries, leaving a total of 11. This practice explains the event size disparity between SIGACTS and IBC, while CFR practice converts this discrepancy over numbers into coverage failures for both IBC and SIGACTs.   Again, the key point is not that IBC’s number is right and the SIGACT number is wrong.  Rather, it is just that different sources and practices have led to different codings of the same event.  CFR’s failure to understand these differences leads them to false conclusions – that IBC missed 6 deaths in this case, when it actually missed 0, and that SIGACTs missed 11 deaths when, at most, it missed 5.

In Part 2 of this series we also discussed Record 11 as a case where SIGACT doubled their death number, from 1 to 2.  One might hope that changing the death count by just 1 would not affect matching but, of course, 1 differs from 2 by more than 30%, making such a record mismatch under CFR’s event-size rule.

Record 11 also highlights a wider problem with the +/- 30% rule and of any such percentage-based rule: for small events, discrepancies of even a single death prohibit matches.  If one source reports 1 death and another 2, or one source reports 2 deaths and the other 3, there is an automatic mismatch.  When one source reports 3 deaths and the other 4, then there may or may not be a match depending on whether 3 or 4 is used as the base for the percentage calculation. With large events, a substantial discrepancy in reported totals is allowed, but with small events no margin whatsoever is tolerated. The 30% rule therefore becomes increasingly stringent as event size declines, which means the rule is biased against matching small events, while being more lenient toward matching large events.

The issue of the ambiguous borderline applies to records 4, 5, 30 and 36. For example, Record 4 reports 7 deaths in a bombing at the Imam Hussein mosque in Karbala on December 15, 2004. IBC has precisely this event but with 10 deaths. Applying the 30% rule to a base of 7 only allows an IBC number up to 9, so the IBC record would not match the SIGACT.  However, applying the 30% rule to a base of 10 would allow the SIGACT number of 7 to match. CFR does not explain how to apply their rule, so we can’t know with certainty how to interpret this event.  However, since they report only one Karbala event matching all three of their criteria it seems likely that exceeding 30% in either direction was sufficient for them to rule out a match.

  1. Event size (Aggregates) – 7 records (1,16,17,29,31,40,44) and 8 deaths – 14% of records and 0.9% of deaths affected

There is a further issue that relates to event size, but it is sufficiently different in character to warrant separate consideration. CFR assume that every record for both datasets is a discrete event.  Moreover, they assume that both IBC and SIGACT coders share a common understanding of what “events” are and code only things that meet these unspecified but shared criteria for constituting an event.

These assumptions are frequently wrong.  In practice, multiple events are often aggregated together into single records with the sizes of these records being a combined total of the multiple smaller events that comprise the single record. Therefore, CFR’s concept of “event size” is better understood as “record size.” This clarification, which may seem pedantic at first glance, seriously undermines their matching analysis. When one dataset reports a discrete event, while the other reports an aggregated total of multiple events, the “event size” of the two records will rarely match under CFR’s criteria, but this is because one record is actually “an event” and the latter is not.  CFR assume that a 30% disparity in record sizes proves that the two datasets are recording different events and therefore deaths, but in such cases these disparities prove no such thing. The two records won’t match on “event size” simply because they are structured differently.  In fact, their sizes generally shouldn’t match because the nature of the records is so different.

As an example, SIGACT Record 4 reports the murder of two truck drivers on February 17, 2004, based on information provided by Iraqi police in Karbala. IBC does not have an event that matches these characteristics on that date, but IBC record x350b does report “14 additional violent deaths recorded at Tikrit and Karbala morgues” in the month of February 2004. This is consistent with the time and type of deaths.  Moreover, it is a fairly standard police procedure to forward dead bodies from their homicide cases to a local morgue, so it is likely that the bodies of the 2 truck drivers killed in such an event would be included among the 14 deaths handled by the Tikrit and Karbala morgues. Thus, the available record provides evidence that IBC record x350b would likely include the deaths in SIGACT Record 4, while providing no evidence that those deaths are separate from those in IBC record x350b.  Nevertheless, CFR procedures lead to the conclusion that IBC missed the 2 truck driver deaths and that SIGACT missed an event of size 14, although we know that there was no such “event” of size 14 in the first place.

Of course, we can’t rule out the possibility that the 2 truck-driver deaths somehow eluded the Karbala morgue, so that the IBC entry is actually wholly separate from the SIGACT one.  But there is no evidence in favour of this conclusion while there is evidence against it. It’s worth considering some common standards of evidence, such as “beyond a reasonable doubt” and “preponderance of evidence”. We would argue that a matching conclusion in the above case would not meet the “beyond a reasonable doubt” standard, but would meet the “preponderance of evidence” standard. CFR’s mismatch conclusion, on the other hand, would not meet any reasonable standard of evidence.

Media sources frequently employ event aggregation as a reporting style so this is an important point when analysing the IBC data. The degree of aggregation can also vary greatly from report to report, from aggregating just a handful of events on one day, to aggregating hundreds over the course of a month or more. Therefore, CFR’s conflation of “event size” with “record size” renders a substantial portion of the data automatically mismatching, regardless of what further information could be available in the detailed records. In such cases, it is simply not appropriate to impose an event size requirement on the matching, as that feature is not relevant to whether the cases overlap. Imposing that requirement anyway, as CFR did, only leads to many baseless conclusions of mismatching that aren’t supported by any evidence in the documents.

This aggregation issue runs almost exclusively in one direction because IBC has many such aggregate entries while SIGACTs has them only very rarely.  So although it’s theoretically possible that the two datasets could have aggregate records that match each other within CFR’s event size range, it would be very unlikely for this to happen in practice.

We may make a future post that delves into the details of how to conduct proper matching in the presence of aggregate entries where one needs to consider factors such as time, place, type and other relevant details which can, for example, include explicit statements that bodies were transferred to a morgue or other destination.  But for present purposes, it’s sufficient to understand that CFR gets the issue conceptually wrong, with the main consequence that their analysis exaggerates the number of large “events” missed by SIGACT and the number of small events missed by IBC.

  1. Casualty type – 1 record (11) and 2 deaths – 2% of records and 0.4% of deaths affected

(To avoid confusion, we note first that we already discussed this record both above and in part 4 of our previous post in this series as a case for which a single death was incorrectly coded as 2 fatalities in the SIGACT database.)

Another divergence between the IBC and SIGACT datasets is over the inclusion criteria for recording deaths. SIGACT does not require deaths to be violent, and although the vast majority of its deaths are violent, it does include some non-violent ones. In fact, the SIGACTs dataset was never meant to focus on deaths but, rather, was intended as a record of “significant activities” relevant to US-Coalition operations in Iraq.  It covers US-Coalition actions plus other activities relevant to their presence and objectives, including the security situation in general. Deaths are merely one aspect of some, but not most, “significant activities” and they do not have to be violent deaths to be considered pertinent to US-Coalition operations in the country.

Record 11, in which a child suffered a seizure while US troops were handing out toys is a case in point. The troops and the child’s parents reportedly attempted to help the child and he was rushed to Karbala Hospital where he later died. The record notes that the child had a history of such seizures. As such, this appears to be a death caused by unfortunate, but natural, causes.  It was not a violent or war-related death of the sort that would be recorded in the IBC database.  However, the incident occurs within the context of the activities of a unit of US troops, who spent much of their day on it, so it is a “significant activity” and it makes sense that the US-Coalition would want its details in the official record.  But it is also outside the scope of IBC methodology (See IBC Methods section 3.3).

The issue of divergent inclusion criteria is important for IBC-SIGACT matching and, more generally, for other efforts to match deaths across multiple sources.  The statistical method of “capture/recapture” relies on an assumption, often violated, that the underlying sources are “fishing in the same ponds”. If the two datasets have differing inclusion criteria, they aren’t fishing in the same pond. This problem only applies to a single death in the small Karbala sample here, but it will pose many other problems in a broader matching effort using these two datasets.

  1. Date/Time (24hr rule – sometimes related to Aggregates issue) – 8 records (1,16,17,23,29,31,43,44) and 56 deaths – 16% of records and 10% of deaths affected

CFR requires a date/time match of +/- 24 hours, but this rule also creates some misleading results about coverage.

Record 23 reports an interpreter killed in Karbala on 17 September 2006. IBC has three records between the 16th and 18th of September in Karbala that could potentially be regarded as a match under CFR rules.  However, none of those are the correct match which is dated 15 September in IBC, which is outside the 24 hour requirement. SIGACTs Summary text for record 23, ignored by CFR, states that the interpreter “WAS ASSASSINATED NEAR HIS HOUSE ON THE EVENING OF 15SEP06”. The coded date of 17 September therefore appears to be some kind of coding error, perhaps the date on which the report was processed, but one that would rule out the correct matching event under CFR’s date/time rules.

Record 43 has similar problems with date/time. The record is dated of 30 August 2007, but the text is refers to a “staff estimate” reported on 29 August about casualties “due to terrorist activities that happened on the 28th of AUG”. The numbers, correct date and other details all point to widely reported clashes that occurred in Karbala on 28 August 2007.  These were, apparently, reported and coded into the SIGACTs at a later date. We would interpret record 43 as a clear match for IBC record k7338, but the CFR 24 hour rule would disallow such a match, and produce a false conclusion that all of the casualties in record 43 are missing from the IBC database.

A further issue concerns the use of date ranges in IBC but not in SIGACTs.  CFR never mentions the issue and it appears that any IBC entry with a range that extends more than 24 hours away from a SIGACT date is ruled out as match.  Such a practice would eliminate matches with records 1, 16, 17, 29 and 44, that all have strong IBC match candidates.

 

This brings us to the close of the in-the-weeds part our Karbala analysis.  Let’s now briefly step back and consider the implications.

CFR’s Karbala matching is a complete disaster, finding just a single confident match in the face of evidence that 95% of the deaths match.  The main reason for their failure is that they naively unleash a mechanical algorithm on complicated, messy data about which they know very little.  Some of CFR’s errors could be eliminated by tweaking their algorithm, although such adjustments would probably lead to some other errors. Our blog posts here show that many of the problems would be difficult, if not impossible, to fix.  For example, how does one deal with incorrectly coded GPS coordinates or arbitrarily doubled fatality counts?

We suggest, therefore, that it’s important to think about the many specific characteristics of the datasets we actually have, and the many flaws and idiosyncrasies they might contain, rather than contenting ourselves with assuming that any datasets we come across have the ideal characteristics we want them to have.

 

Leave a comment