Silvio Rendon just published an important new paper that challenges statistical work done for the Truth and Reconciliation Commission (TRC) of Peru on violent deaths in the Peruvian conflict, 1980 to 2000.
I can only scratch the surface in one post. So I plan to focus on a few central points and elaborate later. Two authors of the original TRC estimates have already posted a rebuttal but, for now, I’ll just consider the original TRC work and Rendon’s paper.
The TRC based its estimates on three (after some consolidation) lists of people documented as killed in the war. According to the TRC, these lists officially contain roughly 25,000 unique individuals, many of whom appear on two or even all three lists.
The TRC’s estimation method, known as capture-recapture or multiple systems estimation, is based on analyzing list overlaps in an attempt to discover the number of deaths that did not make it onto any list. Intuitively, heavy overlap across lists suggests that a high percentage of all deaths already made it onto at least one list. On the other hand, light overlap across lists suggests that only a low percentage of all deaths got listed. For more background on this method please take my [as of now] free online course or read this introductory article that I wrote with Nicholas Jewell and Britta Jewell.
The TRC made two especially striking statistical claims:
- The true number of people killed in the war was far higher than the roughly 25,000 deaths that were documented across the three TRC lists. The TRC estimated 69,000 deaths with a 95% uncertainty range of 61,000 to 78,000.
- The left-wing Shining Path (SP) guerrillas killed more people (46%) than the Peruvian State did (30%), contrary to the perpetrator pattern for the roughly 25,000 documented deaths according to which the State killed more people (47%) than the Shining Path did (37%).
… in our work for the truth commission in Perú in 2003, we found that killings attributed to the government had a much higher probability of documentation than killings attributed to the Shining Path insurgency, yet questions of accountability hinged precisely on determining which group perpetrated the majority of the violence . A naïve analysis of the observed data, without accounting for selection bias, would have incorrectly held the state responsible for a larger proportion of the violence.
The Rendon paper challenges this claim, rather convincingly in my view. In fact, Rendon estimates that the percentages attributed to the Shining Path (SP) and the State are roughly 31% and 43% respectively. These numbers are approximately in proportion to their respective percentages among documented deaths. So it seems that the TRC actually introduced a substantial bias rather than correcting one.
How did this happen?
The TRC divided Peru into 58 geographical strata and made death estimates for the State and the SP in each one. But all this subdivision rendered the data too sparse in many places to allow for standard capture-recapture estimates of SP-caused deaths to be performed.
The ideal method to estimate the total number of victims for each perpetrator would be to stratify the data simultaneously by geography and perpetrator and then choose the model with the best fit for each perpetrator in each geographic stratum. This method is not possible because of the sparseness of the data for reported deaths attributed to the PCP-Shining Path and other perpetrators, as mentioned above. (From the TRC Report)
So, the TRC followed an indirect procedure rather than their ideal direct one:
- Estimate the number of deaths caused by either the State or the SP.
- Estimate the number of deaths caused by just the State.
- Estimate SP-caused deaths as 1 – 2, i.e., deaths caused by either the State or the Shining Path minus deaths caused by just the State
Step three may seem correct simply as a matter of arithmetic (State + SP – State = SP). But this is not true when the components in this subtraction problem are uncertain estimates. Theoretically, this indirect method can even lead to negative estimates for unlisted deaths, i.e., estimates for SP deaths that are less than the numbers of documented deaths.
Rendon points out that there are actually nine strata for which you can do standard direct estimates for the SP. He performs this estimation and finds considerably fewer SP-perpetrated deaths than the TRC’s indirect estimates do. Moreover, Rendon performs simulations based on these nine strata which show that overestimation by the indirect method, compared to the direct one, is a general phenomenon for data that look similar to the data in these nine strata.
Next, Rendon provides good evidence that the overestimation phenomenon he has identified applies to the entire country, not just to the nine strata out of the TRC’s 58 that allow for direct estimation of SP-caused deaths. He accomplishes this by merging strata until there are just ten of them. Each of these strata is appropriate for direct estimation while still allowing for considerable geographical heterogeneity in violence patterns. Again, Rendon finds big overestimation by the indirect method compared to the direct one and that the State is the main perpetrator.
Rendon also addresses a further problem with the TRC work that I haven’t yet mentioned: incomplete data fields. One manifestation of incompleteness is that nearly 3,000 documented deaths cannot be placed in one of the TRC’s 58 strata. So these incompletely georeferenced deaths are dropped from the TRC’s estimation. It turns out, however, that more than 2/3 of these are attributed to the State. It is likely that this disproportionate exclusion of State-caused deaths artificially tilted the TRC’s estimates away from State and towards SP responsibility.
There are also approximately 3,000 further deaths listed as caused by unknown perpetrators. Rendon performs a multiple imputation analysis that randomly assigns these deaths, based on known characteristics, to the State, SP and to other, smaller, groups. One fruit of this work is that there are now 11 strata that allow for direct estimation of SP-caused deaths. This may not sound like like a lot of strata but these 11 actually account for roughly half of all the documented SP deaths. Again, Rendon’s results hold up – the State is the main perpetrator and the indirect method substantially overestimates SP-caused deaths.
Here’s a summary.
- The direct method gives much lower estimates for the SP than the indirect one does for the nine strata that allow direct estimates and the direct method assigns primary responsibility to the State.
- When known characteristics are used to allocate deaths by unknown perpetrators to known perpetrators then the direct method again gives much lower SP estimates than the indirect one does and assigns primary responsibility to the State. Now the SP-estimation covers about half of the documented deaths.
- When strata are amalgamated sufficiently so that it is possible to do direct estimates for the whole country then, again, the direct method gives much lower SP estimates than the indirect method does and assigns primary responsibility to the State.
These results make me think that the much hyped reversal of responsibility claimed by the TRC, i.e., that the Shining Path rather than the State was the main perpetrator in the war in Peru, is wrong.
OK, that’s it for this post. More to come.