What can you do with the Peru Data?

Somebody asked a fair question in the comments surrounding the release of the Peru dataset: what can you do with it?

That is a very big question that I can’t fully address in a blog post.  Still, I’ll try to offer a few useful thoughts.  Perhaps some readers will jump in with better ideas.  Also, I’d be delighted to hear from anyone who downloads the data and does something interesting with it.

Here’s some background.

First of all, it is event data .  This means that each line in the spread sheet is a discrete occurrence, such as a battle or a massacre.  There are a bunch of pieces of information about each event such as the date, location, number of people killed, violent actors involved, type of event, etc..

The methodology documents posted on the conflict data page give a fair amount of detail on what is in the data and what the criteria are.  It also could be useful to read this data description for the Colombia conflict database (which is also posted on the conflict data page.)  Of course, they are different conflicts and different databases but the methodologies are very similar.

This paper by David Fielding and Anja Shortland used the Peru data to demonstrate escalation cycles (my phrase, not the authors’) in the conflict:

We show that an increase in civilian abuse by one side was strongly associated with subsequent increases in abuse by the other. In this type of war, foreign intervention could substantially reduce the impact on civilians of a sudden rise in conflict intensity, by moderating the resulting ‘cycle of violence’.

I’m afraid that the published version of their paper is behind a paywall but it should be possible to get hold of it if you really want to.

I believe that Fielding and Shortland didn’t use the event character of the data specifically, instead aggregating the events into monthly time series.  However, in this paper we focused entirely on events, focusing on their sizes and timings:

Many collective human activities, including violence, have been shown to exhibit universal patterns1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, 18, 19. The size distributions of casualties both in whole wars from 1816 to 1980 and terrorist attacks have separately been shown to follow approximate power-law distributions6, 7, 9, 10. However, the possibility of universal patterns ranging across wars in the size distribution or timing of within-conflict events has barely been explored. Here we show that the sizes and timing of violent events within different insurgent conflicts exhibit remarkable similarities. We propose a unified model of human insurgency that reproduces these commonalities, and explains conflict-specific variations quantitatively in terms of underlying rules of engagement. Our model treats each insurgent population as an ecology of dynamically evolving, self-organized groups following common decision-making processes. Our model is consistent with several recent hypotheses about modern insurgency18, 19, 20, is robust to many generalizations21, and establishes a quantitative connection between human insurgency, global terrorism10 and ecology13, 14, 15, 16, 17, 22, 23. Its similarity to financial market models24, 25, 26 provides a surprising link between violent and non-violent forms of human behaviour.

The Peru dataset was one of many we used in that article,.which was about patterns in the size distributions and timings of events that appear in war after war, not just the war in Peru.

The reader’s comment also asked about possible projects for undergraduates.  I’m not sure how to answer this question without knowing more about what kinds of undergraduates we’re talking about and what kinds of skills they have.  But students could certainly do various data manipulation exercises such as breaking down the data by region, perpetrator or type of event.

I hope that this post was useful.  I would be happy to respond to further questions.




3 thoughts on “What can you do with the Peru Data?

  1. Thanks for the follow-up!

    The undergrads I was referring to are political science students with little to no knowledge at all of statistics. As a consequence, event data analysis is probably above their weight.

    However, I was thinking that a good exercise for them might be to track down all the different data sources that end up in a published dataset like the Peru dataset—the teaching goal being to force them to identify all the different operations that are required to assemble a dataset. Students tend, in my view, to take data ‘for granted,’ and are very often uncritical of data that they find online (or worse, cited in the media). Perhaps they would benefit from looking at ‘raw’ data – not a table or an infographic –, and then to look at all the procedures that produced it.


  2. Hi again.

    In principle, the sort of exercise you suggest would be great.

    However, for the Peru dataset it wouldn’t lead very far because everything comes from a single source – the report of Peru’s Truth and Reconciliation Commission. We just had people comb through this (monumental) report and code all events that fit our criteria. Of course, it would be better it would be better to integrate more sources but doing this is just a lot of work.

    Your broader point is about knowing where numbers come from and not just taking them for granted. It would be impossible to overstate the importance of this for all of us, but especially for students who are developing their lifetime habits.

    Actually, inquiring into where numbers come from is one of the primary themes of this blog.

    It is also one of the main themes of my economics of warfare course for which the course outline and slides for last year are all posted on the blog. The course is for third year students here who have had some statistics training but, still, most of it is accessible to people with fairly minimal background.

    By chance, I also developed and taught a new course last year called Survival Statistics that was entirely about becoming a literate consumer of numbers and that should be accessible to anyone with a high school education. I haven’t posted these slides on the blog because this course has nothing to do with war. But if you or anyone else wants to email me I’d be happy to share this content with you. I’m at m.spagat@rhul.ac.uk


