What can you do with the Peru Data?

Somebody asked a fair question in the comments surrounding the release of the Peru dataset: what can you do with it?

That is a very big question that I can’t fully address in a blog post.  Still, I’ll try to offer a few useful thoughts.  Perhaps some readers will jump in with better ideas.  Also, I’d be delighted to hear from anyone who downloads the data and does something interesting with it.

Here’s some background.

First of all, it is event data .  This means that each line in the spread sheet is a discrete occurrence, such as a battle or a massacre.  There are a bunch of pieces of information about each event such as the date, location, number of people killed, violent actors involved, type of event, etc..

The methodology documents posted on the conflict data page give a fair amount of detail on what is in the data and what the criteria are.  It also could be useful to read this data description for the Colombia conflict database (which is also posted on the conflict data page.)  Of course, they are different conflicts and different databases but the methodologies are very similar.

This paper by David Fielding and Anja Shortland used the Peru data to demonstrate escalation cycles (my phrase, not the authors’) in the conflict:

We show that an increase in civilian abuse by one side was strongly associated with subsequent increases in abuse by the other. In this type of war, foreign intervention could substantially reduce the impact on civilians of a sudden rise in conflict intensity, by moderating the resulting ‘cycle of violence’.

I’m afraid that the published version of their paper is behind a paywall but it should be possible to get hold of it if you really want to.

I believe that Fielding and Shortland didn’t use the event character of the data specifically, instead aggregating the events into monthly time series.  However, in this paper we focused entirely on events, focusing on their sizes and timings:

Many collective human activities, including violence, have been shown to exhibit universal patterns1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, 18, 19. The size distributions of casualties both in whole wars from 1816 to 1980 and terrorist attacks have separately been shown to follow approximate power-law distributions6, 7, 9, 10. However, the possibility of universal patterns ranging across wars in the size distribution or timing of within-conflict events has barely been explored. Here we show that the sizes and timing of violent events within different insurgent conflicts exhibit remarkable similarities. We propose a unified model of human insurgency that reproduces these commonalities, and explains conflict-specific variations quantitatively in terms of underlying rules of engagement. Our model treats each insurgent population as an ecology of dynamically evolving, self-organized groups following common decision-making processes. Our model is consistent with several recent hypotheses about modern insurgency18, 19, 20, is robust to many generalizations21, and establishes a quantitative connection between human insurgency, global terrorism10 and ecology13, 14, 15, 16, 17, 22, 23. Its similarity to financial market models24, 25, 26 provides a surprising link between violent and non-violent forms of human behaviour.

The Peru dataset was one of many we used in that article,.which was about patterns in the size distributions and timings of events that appear in war after war, not just the war in Peru.

The reader’s comment also asked about possible projects for undergraduates.  I’m not sure how to answer this question without knowing more about what kinds of undergraduates we’re talking about and what kinds of skills they have.  But students could certainly do various data manipulation exercises such as breaking down the data by region, perpetrator or type of event.

I hope that this post was useful.  I would be happy to respond to further questions.

 

 

Data Dump Friday

I suppose it will come as no surprise that I’m putting up a bit more Iraq public opinion survey data sponsored by the US State Department and obtained through a FOIA.  This time it’s some polls from April of 2006.

I’m unlikely to dump more data over the next three weeks because I’ll be traveling and I’m still not set up with all the ingredients to do these while traveling.  However, I should be doing a fair amount of regular blogging.