A New Graphical Manoeuvre (not Recommended)

Correction.  The first part of this post argues that an anomaly in a published graph is an error that has some substantive implications.  However, an alert reader, Ben Prytherch,  proposed a benign explanation for the anomaly.  I checked with the authors of the graph and it turned out that Ben is right.  So this is a formal correction.  I annotate this part of the post below and will write a follow up post about this as well.    [October 2, 2016]

Many of you know that for some months I’ve been involved in a discussion with Pasquale Cirillo and Nicholas Nassim Taleb.  Steven Pinker joined me in a recent exchange of letters with Cirillo and Taleb.

I won’t summarise the debate in this post but you can bone up by looking here, here and here. You will also rejoice to learn that there will be another exchange of letters soon.

For preparation I had another look at the Cirillo-Taleb paper and was taken aback by their  figure 14a:


The accompanying text says:

If …events…follow a homogeneous Poisson process…their inter-arrival times need to be exponentially distributed….Figure 14 shows that ….these characteristics are satisfactorily observable in our data set.  This is clearly visible in the QQ-plot of Subfigure 14a, where most inter-arrival times tend to cluster on the diagonal.

Please clear your head for the moment of the details (Poisson, QQ, etc.).  The key is that the points should line up along the diagonal, which they seem to do.  Great!

But wait.

The diagonal for this picture should be the 45 degree line whereas the line in the above picture is more like a 35 degree line.  Notice how the X axis goes out to 11 whereas the Y axis only goes up to 7.

[This is where I start to go wrong.  It turns out that the Y axis is scaled differently from the X axis.  If the scaling were the same then the points would line up on the 45 degree line. Personally, I think the exposition would be better if the scaling were the same on both axes but the way that Cirillo and Taleb have done this is not an error as I originally asserted.  October 2, 2016]

Here is the kind of plot we should see if the data really do follow an exponential distribution as Cirillo and Taleb claim their data do [and the pictures were done with the same scaling on both axes as I would have preferred, October 2, 2016]:


For this proper [replace “proper” with “clearer”, October 2, 2016] QQ graph both axes go to 5 and the diagonal is the 45 degree line.  (Ignore the fact that Cirillo and Taleb’s points are stacked above and below each other.  This is only because their data points are rounded to the nearest year)

Thus, Cirillo and Taleb’s figure 14a shows the opposite of what they claim; their data do not fit an exponential distribution.  When properly interpreted the Cirillo-Taleb graph suggests that the data do follow an exponential distribution.  October 2, 2016.

I have to say that I looked at figure 14a many times without noticing this problem.  Presumably they just made a mistake.  [My mistake, actually, October 2, 2016]

But what a slick manoeuvre this would be in the tradition exposed so well by Darrell Huff if done on purpose.  Your data need to be on a particular line.  You draw a line that goes through your data.  You declare success.  Busy people don’t notice you haven’t drawn the right line.  [Of course, Cirillo and Taleb did not engage in such trickery.  October 2, 2016]

By the way, Cirillo and Taleb’s figure 14b also [October 2, 2016] strikes me as out of tune with their accompanying text:



Moreover, no time dependence should be identified among inter-arrival times, for example when plotting an autocorrelogram (ACF).  Figure 14 shows that both of these characteristics [exponential distribution and no time dependence] are satisfactorily observable in our data set.

Again, without getting into the details, they are saying that the little bars are all near 0 (ignore the huge first bar).  I agree that the bars are, indeed, lowish.  But what about the ones near 0.2?  (These are correlations so they have to be between -1.0 and +1.0)  These larger bars do seem to be pretty much below the (unexplained) dotted blue line.  Maybe this is a statistical significance line?  If so then I’d agree to a formulation along the following lines:

We were unable to reject a hypothesis of 0 time dependence at the ??? level.  However, we only had a few hundred observations and with more data we might well reject such a hypothesis, at least for some time lags.  Still, it seems that any time dependence in the data is fairly weak.

I don’t see this as a massive smoking gun.  I believe that Cirillo and Taleb are in the right ball park with their interpretation of these correlations although they have overstated their case.  I do suspect, however, that if Nassim Taleb were standing in my shoes right now he would be shouting that I adamantly deny the overwhelming evidence of massive correlations.  [Well, maybe he wouldn’t.  He was pretty reasonable in our exchange about me correcting my error.  October 2, 2016]

In any case, despite what Cirillo and Taleb seem to think, neither of these pictures directly addresses the main issue that interests them: whether or not there is a trend toward fewer wars per unit of time

PS -I should mention that one of my colleagues, Alessio Sancetta, helped me think through this post.  Of course, all errors are mine and, as always, I’d love to hear from readers and will gladly fix any mistake I may have made.

Technical Appendix

I assumed a fair amount of knowledge above so here are a few more details for anyone out there who craves them.

The data underpinning the pictures is for large wars since 1500.  I don’t have it.  I believe that Cirillo and Taleb have not yet released their data yet but are planning to do so.

Figure 14a is about the distribution of time gaps between wars. Specifically, how often does the next war happen right away (0 time gap), how often do we wait 1 year, 2 years, etc.?

To do an exponential QQ plot you first fit an exponential distribution to the data. This fitted distribution then makes predictions on gaps between wars.  The predictions will be, for example, that 75% of the gaps will exceed 2 years or that 50% of the gaps will exceed 4 years, etc..  You then graphically compare the predictions with the actual gap distribution.  If all the predictions turn out to be exactly correct then the points will line up smack on the 45 degree line.  [In my opinion the above is how one should do a QQ plot that is easy to understand.  October 2, 2016]

How do we interpret the fact that, when done correctly, the points on the right in figure 14a lie well below the 45 degree line?  [What follows in the next paragraph would be true if the QQ plot had been drawn the way I describe but isn’t true of the actual Cirillo Taleb graph.]

This means that the actual gaps at the high end tend to be longer than would be predicted by the fitted exponential curve.  Loosely speaking, when the exponential is predicting a gap of 7 the actual gap turns out to be more like 10, etc..  In other words, the right-hand tail of the distribution of gaps between wars is stretched to the right compared to the exponential fit.

Figure 14b is checking for correlations between gaps at different time lags.  For example, the bar that reaches a height near 0.2 at a lag of 5 says that a longer gap 5 wars ago tends to be associated with a longer gap until the next war.  More generally, this shows that knowledge of past gaps appears to be (weakly) useful in predicting future gaps.





Economics of Warfare – Lecture 1

This morning I gave my first lecture in my Economics of Warfare class.


I plan to continue to post a lecture each week.  I don’t plan to write an abstract for each lecture but you can get a sense of the material covered by looking at “categories” and “tags”.

Interesting Articles on the 9/11 Anniversary

(Note for the confused.  I first made the mistake of posting this without a title and the only way I could figure out to correct the mistake was by trashing the first post and reposting with a title.   MS)

I happen to be in the US right now where you’d have to be unconscious to fail to notice the anniversary of 9/11 yesterday.

The good news is that here have been several interesting articles in the media over the last few days that are pertinent to the subject of casualty recording.

This one is about a first responder who was evacuated early in the rescue operation due to a serious injury.  Thus, he avoided some slow-burning health effects, many of which lead to death, that many of his colleagues suffered.  He now dedicates himself to helping 9/11 first responders and their families.

The article leads to  a list of names of fallen first responders engraved on this wall.

The Wall at the 9/11 Responders Remembered Park

Next, Jay Aronson gives us a teaser for his new book on the attention paid to and myriad controversies surrounding the 2,753 people killed in the Twin Towers. The official commitment to scientifically identify all of the human remains found near the site are beyond anything in previous forensic history. Yet, as Aronson explains, these policies for the treatment of dead bodies have evolved out of a long historical process.

This is the first time the blog has touched on forensic identification of human remains but it is a natural extension of the concept of casualty recording which is about listing names and other pieces of vital information about victims of armed conflict.  Forensic identification can contribute to making sound lists of victims but it is clear that this purpose was, at best, a small part of the motivation for all the forensic work on 9/11.  Rather, the forensics were about showing respect for the victims and their families as well as for signalling something to the perpetrators of the atrocity and their supporters.

Finally, here’s another good Vox article that dwells on the theme of overestimating and overreacting to the threat of terrorism.  This quote from Brian Michael Jenkins is interesting:

It becomes a forever war. It may be that we redefine war and get it out of the notion of a finite undertaking and have to view military operations in much the same way that we look at law enforcement. That is, while we expect police to bring perpetrators to justice, we don’t operate under any illusion that at some point the police will defeat crime.

In other words, we don’t expect the police to completely eliminate all risks of violent crime so why do we expect our governments to completely eliminate the (far lower) risks of terrorism?

Fair question.

Special Journal Issue on Fabrication in Survey Research

The Statistical Journal of the IAOS has just released a new issue with a bunch of articles on fabrication in survey research, a subject of great interest for the blog.

Unfortunately, most of the articles are behind a paywall but, thankfully, the overview by Steve Koczela and Fritz Scheuren is open access.  It’s a beautiful piece – short, sweet, wise and accurate.  Please read it.

Here are my comments.

Way back in 1945 the legendary Leo Crespi stressed the importance of what he called “the cheater problem.”  Although he did this in the flagship survey research journal, Public Opinion Quarterly, the topic has never become mainstream in the profession.  Many survey researchers seem to view the topic of fabrication as not really appropriate for polite company, akin to discussing the sexual history of a bride at her wedding.  Of course, this semi taboo is convenient for cheaters.  Maria Konnikova has a great new book about confidence artists.  Much in the book is relevant to the subject of fabrication in survey research but one point really stands out for me; a key reason why the same cons and the same con artists move seamlessly from mark to mark is that each victim is too embarrassed  to publicize his/her victimization.  276365-smiley 4

Discussions of fabrication that have occurred over the years have almost always focused on what is known as curbstoning, i.e., a single interviewer making up data. (The term comes from an image of a guy sitting on a street curb filling out his forms.)  But this is just one type of cheating and one of the great contributions of Koczela and Scheuren’s  journal edition and the impressive series of prior conferences is that have substantially expanded the scope of the survey fabrication field.  Now we discuss fabrication by supervisors, principal investigators and the leaders of a survey companies.  We now know that  hundreds of public opinion surveys, especially surveys conducted in poor countries, are contaminated by widespread duplication and near duplication of single observations.  (This journal issue publishes the key paper on duplication.)

Let me quote a bit from the to-do list of Koczela an Scheuren.

It does not only happen to small research organizations with fewer resources, as was previously believed [12].  Recent instances involve the biggest and most names in the survey research business, academia and the US Government.

This is certainly true but I would add that reticence about naming names is crippling.  Yes, it’s helpful to know that there are many dubious surveys out there but guidance on which ones they are would be very helpful.

An acknowledgement by the research community that data fabrication is a common threat, particularly in remote and dangerous survey environments would allow the community to be cooperative and proactive in preventing, identifying and mitigating the effects of fabrication.

This comment about remote and dangerous survey environments fits perfectly with my critiques of Iraq surveys including this one.

Given the perceived stakes, these discussion often result in legal threats or even legal action of various types.


…the problem of fabrication is fundamentally one of co-evolution.  The more detection and prevention methods evolve, the more fabricators may evolve to stay ahead.  And to the extent we discover and confirm fabrication, we will never know whether we found it all, or caught only the weakest of the pack.  With these truths in mind, more work is needed in developing and testing statistical methods of fabrication detection.  This is made more difficult by the lack of training datasets, a problem prolonged by a general unwillingness to openly discuss data fabrication.

Again, I couldn’t agree more.

Technical countermeasures during fielding are less useful in harder to survey areas, which also happen to be the areas where the incentive to fabricate data is the highest. Many of the recent advances in field quality control processes focus on areas where technical measures such as computer audio recording, GPS, and other mechanisms can be used [6,13].

In remote and dangerous areas, where temptation to fabricate is the highest, technical countermeasures are often sparse [9]. And perversely, these are often the most closely watched international polls, since they often represent the hotspots of American interest and activity. Robbins and Kuriakose show a heavy skew in the presence of duplicate cases in non-OECD countries, potentially a troubling indicator. These polls conducted in remote areas often have direct bearing on policy for the US and other countries. To get a sense of the impact of the polls, a brief review of the recently released Iraq Inquiry, the so-called Chilcot report, contains dozens of documents that refer, in most cases uncritically, to the impact and importance of polls.

To be honest, Koczela and Scheuren do such a great job with their short essay that I’m struggling to add value here.  What they write above is hugely pertinent to all the work I’ve done on surveys in Iraq.

By the way, a response I sometimes get to my critiques of the notorious Burnham et al. survey of deaths in the Iraq war (see, for example, here, here and here) is that it is unreasonable to expect perfection for a survey operating in such a difficult environment.  Fair enough.  But then you have to concede that we cannot expect high-quality results from such a survey either.  If I were to walk in off the street and take Harvard’s PhD qualification exam in physics (I’m assuming they have such a thing….) it would be unreasonable to expect me to do well.  I just haven’t prepared for such an exam.  Fine, but that doesn’t somehow make me an authority on physics.  It just gives me a perfect excuse for not being such an authority.

Finally, Koczela and Scheuren provide a mass of resources that researchers can use to bring themselves to the frontier of the survey fabrication field.  Anyone interested in this subject needs to take a look at these resources.