As promised, I’ve just posted the slides of the talk I gave yesterday at York University (with some overnight modifications).
You can get background with links for further background here.
Somewhat bizarrely, Steven Pinker’s 2011 book was rocketing to the top of the Amazon best seller list due to a Bill Gates Tweet right when I was talking about it at York.. So I guess my timing is good.
When data are central to scientific discussions, as is typically the case, then the relevant data should be open to all.
OK, we don’t have to be totally rigid about this. People may sink a lot of effort into building a data set so it’s reasonable for data builders to milk their data monopoly for some grace period. In my opinion, you get one publication. Then you put your data into the public domain.
And public domain means public domain. It’s not OK to hide your data from people you don’t like, from people you think are incompetent, from people you suspect of having engaged in acts of moral turpitude, etc.. You post your data so everyone can have them.
If you put your data into the public domain and someone does something stupid with it then it’s fine to say that. It’s a virtue to be nice but being nice isn’t a requirement. But as far as I’m concerned you share your data or you’re not doing science.
Readers of the blog should be well aware that there has been a dispute about the decline of war (or not), primarily between Steven Pinker and Nassim Nicholas Taleb. You can track my participation in this debate from a bunch of my blog entries and the links they contain. I’m in the middle of preparing a conference talk on this subject, and I’ll post the slides later this week….so more is coming.
I planned a little data work to support the talk so I emailed Taleb asking him for the data he used to launch his attack on Pinker’s work. Here is his reply.
1) It is not professional to publish a “flaw” without first contacting the authors. You did it twice.
2) Your 2 nitpicking “flaws” betrayed total ignorance of the subject.
So I will ask you to fuck off.
He is referring to this post (which did contain an error that I corrected after a reader pointed it out.)
What can I say?
The main thing is that if he wants to do science then it’s not OK to just declare someone to be ignorant and withhold data.
Beyond that I’d say that if he still objects to something in my post he should be specific, either in the comments or to me directly. As always, I’ll issue a correction or clarification if I get something wrong.
Third, it isn’t really standard to clear in advance criticisms of someone’s work with the person being criticized. Doing this could be a reasonable strategy in some cases. And it’s reasonable to send criticism to the person being criticized. Correcting errors, as I do, is essential.
Anyway, I take away from this episode that Taleb isn’t doing science and also that he probably doesn’t have great confidence in his work on this subject or else he wouldn’t hide his data.
Significance Magazine is now hosting its final-exchange-of-letters on the future of war. Once again, it is Steven Pinker and I dueling with Nassim Nicholas Taleb and Pasquale Cirillo. You can judge for yourselves whether the four of us have hit a common groove.
If you feel sad because you have not followed all the twists and turns of this discussion then you should click through these links.
Correction. The first part of this post argues that an anomaly in a published graph is an error that has some substantive implications. However, an alert reader, Ben Prytherch, proposed a benign explanation for the anomaly. I checked with the authors of the graph and it turned out that Ben is right. So this is a formal correction. I annotate this part of the post below and will write a follow up post about this as well. [October 2, 2016]
Many of you know that for some months I’ve been involved in a discussion with Pasquale Cirillo and Nicholas Nassim Taleb. Steven Pinker joined me in a recent exchange of letters with Cirillo and Taleb.
For preparation I had another look at the Cirillo-Taleb paper and was taken aback by their figure 14a:
The accompanying text says:
If …events…follow a homogeneous Poisson process…their inter-arrival times need to be exponentially distributed….Figure 14 shows that ….these characteristics are satisfactorily observable in our data set. This is clearly visible in the QQ-plot of Subfigure 14a, where most inter-arrival times tend to cluster on the diagonal.
Please clear your head for the moment of the details (Poisson, QQ, etc.). The key is that the points should line up along the diagonal, which they seem to do. Great!
The diagonal for this picture should be the 45 degree line whereas the line in the above picture is more like a 35 degree line. Notice how the X axis goes out to 11 whereas the Y axis only goes up to 7.
[This is where I start to go wrong. It turns out that the Y axis is scaled differently from the X axis. If the scaling were the same then the points would line up on the 45 degree line. Personally, I think the exposition would be better if the scaling were the same on both axes but the way that Cirillo and Taleb have done this is not an error as I originally asserted. October 2, 2016]
Here is the kind of plot we should see if the data really do follow an exponential distribution as Cirillo and Taleb claim their data do [and the pictures were done with the same scaling on both axes as I would have preferred, October 2, 2016]:
For this proper [replace “proper” with “clearer”, October 2, 2016] QQ graph both axes go to 5 and the diagonal is the 45 degree line. (Ignore the fact that Cirillo and Taleb’s points are stacked above and below each other. This is only because their data points are rounded to the nearest year)
Thus, Cirillo and Taleb’s figure 14a shows the opposite of what they claim; their data do not fit an exponential distribution. When properly interpreted the Cirillo-Taleb graph suggests that the data do follow an exponential distribution. October 2, 2016.
I have to say that I looked at figure 14a many times without noticing this problem. Presumably they just made a mistake. [My mistake, actually, October 2, 2016]
But what a slick manoeuvre this would be in the tradition exposed so well by Darrell Huff if done on purpose. Your data need to be on a particular line. You draw a line that goes through your data. You declare success. Busy people don’t notice you haven’t drawn the right line. [Of course, Cirillo and Taleb did not engage in such trickery. October 2, 2016]
By the way, Cirillo and Taleb’s figure 14b
also [October 2, 2016] strikes me as out of tune with their accompanying text:
Moreover, no time dependence should be identified among inter-arrival times, for example when plotting an autocorrelogram (ACF). Figure 14 shows that both of these characteristics [exponential distribution and no time dependence] are satisfactorily observable in our data set.
Again, without getting into the details, they are saying that the little bars are all near 0 (ignore the huge first bar). I agree that the bars are, indeed, lowish. But what about the ones near 0.2? (These are correlations so they have to be between -1.0 and +1.0) These larger bars do seem to be pretty much below the (unexplained) dotted blue line. Maybe this is a statistical significance line? If so then I’d agree to a formulation along the following lines:
We were unable to reject a hypothesis of 0 time dependence at the ??? level. However, we only had a few hundred observations and with more data we might well reject such a hypothesis, at least for some time lags. Still, it seems that any time dependence in the data is fairly weak.
I don’t see this as a massive smoking gun. I believe that Cirillo and Taleb are in the right ball park with their interpretation of these correlations although they have overstated their case. I do suspect, however, that if Nassim Taleb were standing in my shoes right now he would be shouting that I adamantly deny the overwhelming evidence of massive correlations. [Well, maybe he wouldn’t. He was pretty reasonable in our exchange about me correcting my error. October 2, 2016]
In any case, despite what Cirillo and Taleb seem to think, neither of these pictures directly addresses the main issue that interests them: whether or not there is a trend toward fewer wars per unit of time
PS -I should mention that one of my colleagues, Alessio Sancetta, helped me think through this post. Of course, all errors are mine and, as always, I’d love to hear from readers and will gladly fix any mistake I may have made.
I assumed a fair amount of knowledge above so here are a few more details for anyone out there who craves them.
The data underpinning the pictures is for large wars since 1500. I don’t have it. I believe that Cirillo and Taleb have not yet released their data yet but are planning to do so.
Figure 14a is about the distribution of time gaps between wars. Specifically, how often does the next war happen right away (0 time gap), how often do we wait 1 year, 2 years, etc.?
To do an exponential QQ plot you first fit an exponential distribution to the data. This fitted distribution then makes predictions on gaps between wars. The predictions will be, for example, that 75% of the gaps will exceed 2 years or that 50% of the gaps will exceed 4 years, etc.. You then graphically compare the predictions with the actual gap distribution. If all the predictions turn out to be exactly correct then the points will line up smack on the 45 degree line. [In my opinion the above is how one should do a QQ plot that is easy to understand. October 2, 2016]
How do we interpret the fact that, when done correctly, the points on the right in figure 14a lie well below the 45 degree line? [What follows in the next paragraph would be true if the QQ plot had been drawn the way I describe but isn’t true of the actual Cirillo Taleb graph.]
This means that the actual gaps at the high end tend to be longer than would be predicted by the fitted exponential curve. Loosely speaking, when the exponential is predicting a gap of 7 the actual gap turns out to be more like 10, etc.. In other words, the right-hand tail of the distribution of gaps between wars is stretched to the right compared to the exponential fit.
Figure 14b is checking for correlations between gaps at different time lags. For example, the bar that reaches a height near 0.2 at a lag of 5 says that a longer gap 5 wars ago tends to be associated with a longer gap until the next war. More generally, this shows that knowledge of past gaps appears to be (weakly) useful in predicting future gaps.
I’m sure that this reminder of my article in Significance will awaken warm memories in many of you.
In it, I used an article of Pasquale Cirrilo and Nassim Nicholas Taleb to help me make my case. I think that the Cirillo-Taleb paper is quite interesting. However, it is not in any way a shoot-down of Steven Pinker’s masterwork, The Better Angels of our Nature, as Cirillo and Taleb like to claim. In fact, I argued in an earlier STATS.org article that there isn’t even any great contradiction between Cirrilo-Taleb and Better Angels.
I seem to have provoked Cirrilo and Taleb who wrote a protest letter to Significance about my piece. To me, it feels like my main crime is that I didn’t dismiss Pinker as an incompetent writer of “popular science.” Or perhaps the issue is that my short piece leaned more heavily on Better Angels than it did on Cirrilo-Taleb. In any case, I don’t think that Cirrilo and Taleb help themselves very much with their letter.
Steven Pinker and I now have a joint reply to Cirrilo and Taleb in the current issue of Significance.
Please have a look.
I’m happy to report that I don’t feel a need rush out any corrections/changes to my piece. But definitely read the Pinker piece which brings into play a wealth of relevant knowledge I didn’t have when I wrote my piece.