Important New Violent Death Estimates for the War in Peru with Implications Beyond just Peru: Part 1

Silvio Rendon just published an important new paper that challenges statistical work done for the Truth and Reconciliation Commission (TRC) of Peru on violent deaths in the Peruvian conflict, 1980 to 2000.

I can only scratch the surface in one post.  So I plan to focus on a few central points and elaborate later.  Two authors of the original TRC estimates have already posted a rebuttal but, for now, I’ll just consider the original TRC work and Rendon’s paper.

The TRC based its estimates on three (after some consolidation) lists of people documented as killed in the war.  According to the TRC, these lists officially contain roughly 25,000 unique individuals, many of whom appear on two or even all three lists.

The TRC’s estimation method, known as capture-recapture or multiple systems estimation, is based on analyzing list overlaps in an attempt to discover the number of deaths that did not make it onto any list.  Intuitively, heavy overlap across lists suggests that a high percentage of all deaths already made it onto at least one list.  On the other hand, light overlap across lists suggests that only a low percentage of all deaths got listed.  For more background on this method please take my [as of now] free online course  or read this introductory article that I wrote with Nicholas Jewell and Britta Jewell.

The TRC made two especially striking statistical claims:

  1. The true number of people killed in the war was far higher than the roughly 25,000 deaths that were documented across the three TRC lists.  The TRC estimated 69,000 deaths with a 95% uncertainty range of 61,000 to 78,000.
  2. The left-wing Shining Path (SP) guerrillas killed more people (46%)  than the Peruvian State did (30%), contrary to the perpetrator pattern for the roughly 25,000 documented deaths according to which the State killed more people (47%) than the Shining Path did (37%).

Advocates for applying capture-recapture methods lean heavily on the State-SP reversal described in point 2 above.  For example, Megan Price and Patrick Ball write:

… in our work for the truth commission in Perú in 2003, we found that killings attributed to the government had a much higher probability of documentation than  killings attributed to the Shining Path insurgency, yet questions of accountability hinged precisely on determining which group perpetrated the majority of the violence [3]. A naïve analysis of the observed data, without accounting for selection bias, would have incorrectly held the state responsible for a larger proportion of the violence.

The Rendon paper challenges this claim, rather convincingly in my view.  In fact, Rendon estimates that the percentages attributed to the Shining Path (SP) and the State are roughly 31% and 43% respectively.  These numbers are approximately in proportion to their respective percentages among documented deaths.  So it seems that the TRC actually introduced a substantial bias rather than correcting one.

How did this happen?

The TRC divided Peru into 58 geographical strata and made death estimates for the State and the SP in each one.  But all this subdivision rendered the data too sparse in many places to allow for standard capture-recapture estimates of SP-caused deaths to be performed.

The ideal method to estimate the total number of victims for each perpetrator would be to stratify the data simultaneously by geography and perpetrator and then choose the model with the best fit for each perpetrator in each geographic stratum. This method is not possible because of the sparseness of the data for reported deaths attributed to the PCP-Shining Path and other perpetrators, as mentioned above. (From the TRC Report)

So, the TRC followed an indirect procedure rather than their ideal direct one:

  1. Estimate the number of deaths caused by either the State or the SP.
  2. Estimate the number of deaths caused by just the State.
  3. Estimate SP-caused deaths as 1 – 2, i.e., deaths caused by either the State or the Shining Path minus deaths caused by just the State

Step three may seem correct simply as a matter of arithmetic (State + SP – State = SP).  But this is not true when the components in this subtraction problem are uncertain estimates.  Theoretically, this indirect method can even lead to negative estimates for unlisted deaths, i.e., estimates for SP deaths that are less than the numbers of documented deaths.

Rendon points out that there are actually nine strata for which you can do standard direct estimates for the SP.  He performs this estimation and finds considerably fewer SP-perpetrated deaths than the TRC’s indirect estimates do.  Moreover, Rendon performs simulations based on these nine strata which show that overestimation by the indirect method, compared to the direct one, is a general phenomenon for data that look similar to the data in these nine strata.

Next, Rendon provides good evidence that the overestimation phenomenon he has identified  applies to the entire country, not just to the nine strata out of the TRC’s 58 that allow for direct estimation of SP-caused deaths.  He accomplishes this by merging strata until there are just ten of them.  Each of these strata is appropriate for direct estimation while still allowing for  considerable geographical heterogeneity in violence patterns.  Again, Rendon finds big  overestimation by the indirect method compared to the direct one and that the State is the main perpetrator.

Rendon also addresses a further problem with the TRC work that I haven’t yet mentioned: incomplete data fields. One manifestation of incompleteness is that nearly 3,000 documented deaths cannot be placed in one of the TRC’s 58 strata.  So these incompletely georeferenced deaths are dropped from the TRC’s estimation.  It turns out, however, that more than 2/3 of these are attributed to the State.  It is likely that this disproportionate exclusion of State-caused deaths artificially tilted the TRC’s estimates away from State and towards SP responsibility.

There are also approximately 3,000 further deaths  listed as caused by unknown perpetrators.  Rendon performs a multiple imputation analysis that randomly assigns these deaths, based on known characteristics, to the State, SP and to other, smaller, groups.  One fruit of this work is that there are now 11 strata that allow for direct estimation of SP-caused deaths.  This may not sound like like a lot of strata but these 11 actually account for roughly half of all the documented SP deaths.  Again, Rendon’s results hold up – the State is the main perpetrator and the indirect method substantially overestimates SP-caused deaths.

Here’s a summary.

  1. The direct method gives much lower estimates for the SP than the indirect one does for the nine strata that allow direct estimates and the direct method assigns primary responsibility to the State.
  2. When known characteristics are used to allocate deaths by unknown perpetrators to known perpetrators then the direct method again gives much lower SP estimates than the indirect one does and assigns primary responsibility to the State.  Now the SP-estimation covers about half of the documented deaths.
  3. When strata are amalgamated sufficiently so that it is possible to do direct estimates for the whole country then, again, the direct method gives much lower SP estimates than the indirect method does and assigns primary responsibility to the State.

These results make me think that the much hyped reversal of responsibility claimed by the TRC, i.e., that the Shining Path rather than the State was the main perpetrator in the war in Peru, is wrong.

OK, that’s it for this post.  More to come.

 

Advertisements

11 thoughts on “Important New Violent Death Estimates for the War in Peru with Implications Beyond just Peru: Part 1

  1. Hi. Thanks for responding. I will react to your rebuttal in an upcoming post, hopefully over the next few days.

    Briefly, though, the numbers you refer to as “known” are not known in any conventional sense. So far as I’m aware, these “known” numbers only appeared in the public domain when you uploaded your paper a few days ago.

    Furthermore, I understand that you were only able to produce those numbers after obtaining special access to confidential data from the Peruvian TRC. This closed data was then matched against deaths in a new open-source post-TRC project.

    It seems unfair to criticize Rendon for not being privy to this observable activity.

    There are further issues of open science and replicability here which I also plan to address in due course.

    Like

  2. The MIMDES reports, including the names of all the victims, were published on paper and in pdfs circulated online starting in 2006. For our matching, we needed the TRC’s original data. But the MIMDES data has been public for well over a decade, and it has been used in academic work (we cite one in our rebuttal). The proportions of responsibility presented in the MIMDES data are consistent with the TRC’s observed data and with the estimates. There’s nothing secret about this dataset, or the Ministry project that produced it. That Rendon was not aware of it does not make it somehow secret.

    Like

    1. Yes, but the numbers you say Rendon should have known can only be constructed by people with access to the detailed TRC data which is not publicly available. The MIMDES data is publicly available but it’s not enough on its own.

      Like

      1. “People with access to the detailed TRC data” is not an intrinsic category of people: it’s just people who have asked nicely and persisted (sometimes, like us, over several years), until they got access to the data. It seems to me that with sensitive data, obtaining the relevant information is incumbent upon the researcher: Rendon could have inquired of the Peruvian Ombudsman office to get the TRC data. It’s not secret, it just requires a bit of work to obtain, and he chose not to do so.

        However, even if you couldn’t (or didn’t want to bother) to get these difficult-to-obtain data, there is enough in the publicly available data to raise real concerns about Rendon’s approach. The fact that MIMDES independently found and published observed distributions even more strongly pointing to Sendero than the TRC’s data should be a clear clue that the TRC’s work was sound. Simply given the sheer magnitude of the MIMDES counts, to defend Rendon’s estimates, one would have to speculate that the MIMDES data is essentially a subset of the TRC’s data, and that’s implausible. Furthermore, even in the TRC data that Rendon has, he could have used stratum 25 to check his estimation, and he would have found that his estimate for Sendero deaths is lower than the count of the observed data.

        Rendon did none of these things.

        Like

  3. I’d like to add that the fundamental problems in Rendón’s paper are technical. The fact that his results ended up contradicting reality makes it easier to see (and for us, to show) that his approach must be incorrect. However, the reason why he got those results is not because of MIMDES data; it’s because his technical execution is flawed. In our reply we point out the most egregious examples we found, and show why they are problematic. Those problems would remain (though perhaps they would not be so easy to expose) even if we hadn’t pointed out the data discrepancies.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s