Open the Door to all the Hidden Election Polling Data

The UK House of Lords has issued a call for evidence on the effects of political polling and digital digital media on politics.  Submissions are due next week so maybe someone out there wants to dash something off….or maybe someone would be so kind as to give me feedback on my proposal.  Below I give a draft.

Comments welcome!

Note that everything I say applies equally to political polling in the US and around the globe but, quite reasonably, the Lords ask about British polling so the proposal is written about British polling.

(OK, the proposal is about election polling, not war.  But this post is very much in keeping with the open data theme of the blog so I believe it will be of general interest to my readers.)

pexels-photo-147634

 

The Proposal[1]

I have one specific suggestion that could, if implemented, substantially improve political life in the UK; require collectors of political polling data to release their detailed micro datasets into the public domain.

A Preliminary Clarification

Some readers may think, wrongly, that pollsters generally do provide detailed micro datasets already.  Occasionally they do.  But normally they just publish summary tables, while withholding the interview-by-interview results (anonymized, of course).  Researchers need such detailed data to make valid estimates.

The Argument

Let me develop this idea in steps.

Political pollsters face two main challenges.  First, they cannot draw well-behaved random samples of voters for their polls.  This is mainly because most people selected for interviews refuse to participate.  Moreover, the political views of the refusers differ systematically from those of the participants.  Second, it is difficult to predict which poll participants will turn out to vote.  Yet good election prediction relies on good turnout prediction.

These two challenges dictate that political polling datasets cannot simply interpret themselves.  Rather, pollsters must use their knowledge, experience, intuition, wisdom and other wiles to model their way out of the shortcomings of their data.  There now exists a growing array of techniques that can be deployed to address political polling challenges.  But good applications of these techniques embody substantial elements of professional judgment, about which experts disagree.

This New York Times article leaves little doubt about the point of the last paragraph.  The NYT gave the detailed micro data from one of their proprietary Trump-Clinton polls to four analytical teams and asked for projections.  The results ranged from Clinton +4 to Trump +1.[2]  These are all valid estimates made by serious professionals.  Yet they differ quite substantively because the teams differ in some of their key judgments.

The key point is that for the foreseeable future there will not be one correct analytical technique that, if applied properly, will always lead to a correct treatment of new polling data.  Rather, there will be a useful range of valid analyses that can be made from any political polling dataset.

Presently we are robbed of all but one analysis of most political polling datasets that are collected in the UK.  This is because polling data are held privately and never released into the public domain.  This data black hole wastes opportunities in two distinct directions.  First, we cannot learn as much as possible about the state of public opinion during elections.  Second, by limiting the range of experimentation that is applied to each dataset we retard the development process for improving our analytical techniques.

An Important Caveat

Much political polling data are collected by private companies that must make a profit on their investment.  These organizations might feel threatened by this open data proposal.  However, these concerns can easily be addressed by allowing an appropriate interval of time for data collectors to monopolize their datasets.  This could work much in the way that patents are issued to provide creative incentives for inventors by giving inventors a window of time to reap high rewards before their inventions can be copied by competitors.  The only difference here is that these monopolization intervals for pollsters should be much shorter than they are for patent intervals, probably only two weeks or so.

Parliament Should Defend the Public Interest

There is a strong public interest in making full use of political polling data.  Yet even public organizations like the BBC collect political polling data (although not in the 2017 election), write up general summaries and then consign their detailed micro data into oblivion.  If public organizations cannot be convinced to do a better job of serving the public interest then they should be forced to do so.  Even private companies should be forced, by legislation if necessary, to place their political polling data into the public domain after they have been allowed a decent interval designed to feed their bottom lines.

I do not argue that all private survey data should be released to the public.  There must be a public interest test that has to be satisfied before public release can be mandated.  This test would not be satisfied for most privately collected survey data.  But election polling does meet this public interest standard and should be claimed as a public resource to benefit everyone in the UK.

 

[1] I urge the committee to consult with the leadership of the Royal Statistical Society on this question.  I have not coordinated my submission with them but I believe that they would back it.

[2] The official NYT estimate was Clinton +1.

Advertisements

Secret Data Sunday – BBC Edition Part 2 – Data Journalism with Data

Last week I described my initial attempt to obtain some Iraq survey data from the BBC.

You can skip the long back story that explains my interest in these data sets if you want.  In short, though, these award-winning polls played an important role in establishing the historical record for the latest Iraq war but they are very likely to be contaminated with a lot of fabricated data.  ABC news, and its pollster Gary Langer, are hiding the data.  But the BBC is a co-sponsor of the polls so I figured that I could just get the data from the BBC instead.  (This and this give more details on the back story.)

At first I thought, naively, that the BBC had to produce the data in response to a Freedom of Information (FOIA) request.  But when I put this theory to the test I discovered that the BBC is, essentially, immune to FOIA.

So I wrote to the Chairman of the BBC Trust (at the time, Rona Fairhead).  She quickly replied, saying that the Trust can’t intervene unless there is a complaint.  So she passed my letter on to the newsroom and eventually I heard from Nick Sutton who is an editor there.

Nick immediately plopped a bombshell into my lap.

The BBC does not have and never did have the data sets for their award-winning polls.

Studio shot of a handsome man with a confused expression

To my amazement, BBC reporting on these Iraq public opinion polls just forwarded to its trusting public whatever ABC news told the BBC to say.

Such data journalism without data is over-the-top unethical behaviour by the BBC.

However, you can’t hide data that you don’t have so the ethics issues raised here fall outside the scope of Secret Data Sunday.  Consequently, I’ll return to the data journalism issues later in a middle-of-the-week post.

Here I just finish by returning to my failed FOIA.

Why didn’t the BBC respond to my FOIA data request by simply saying that they didn’t have the data?  Is it that they wanted to hide their no-data embarrassment?   This is possible but I doubt it.  Rather, I suspect that the BBC just responds automatically to all FOIA’s by saying that whatever you want is not subject to FOIA because they might use it for journalistic or artistic purposes.  I suspect that they make this claim regardless of whether or not they have any such plans.

To British readers I suggest that you engage in the following soothing activities while you pay your £147 subscriber fee next year.  First, repeatedly recite the mantra “Data Journalism without Data, Data Journalism without Data, Data Journalism without Data,…”.  Then reflect on why the BBC is exempt from providing basic information to the public that sustains it.

 

Secret Data Sunday – BBC Edition Part 1

If you have spent any time on this blog you know that D3 Systems, together with KA Research Limited, fielded a lot of polls in Iraq during the occupation and that the ones I’ve managed to analyze show extensive evidence of containing fabricated data.

Some such polls were commissioned by ABC news and won big awards.  But ABC news and their pollster (Gary Langer) refuse to share their data.  This is a pretty good indication that they are well aware of the rot in their house.

It turns out that ABC news was not the sole sponsor of the series of polls in questions.  The BBC was a cosponsor.  So I figured that rather than beating my head against the wall with ABC and Gary Langer I would try with the BBC.

Sadly, it turns out that the BBC stone wall is just as solid as the ABC-Langer one.  In fact, the BBC was so stout in hiding the truth that I’ll need multiple posts to cover their reaction to the news that they are distorting the historical record on the the Iraq war.

So let’s get started.

My first try was a Freedom of Information request to the BBC asking for the data.  The one thing I learned from this denied request is that the BBC is pretty much immune to FOIA.  All they have to do is say that they plan to use the thing you want for artistic or journalistic purposes and they are done.  They don’t have to actually use what you want for such purposes – it is enough to just claim that they have a vague intention of doing so.

Below I reproduce the BBC letter which also pretty much reproduces my request.  (The formatting came out a little weird here but it should be readable.)

 

British Broadcasting Corporation Room BC2 A4 Broadcast Centre White City Wood Lane London W12 7TP
Telephone 020 8008 2882

Email foi@bbc.co.uk

Information Rights

bbc.co.uk/foi bbc.co.uk/privacy
Professor Michael Spagat

Via email: M.Spagat@rhul.ac.uk

4th May 2016
Dear M Spagat,

Freedom of Information request – RFI20160727
Thank you for your request to the BBC of 5th April 2016, seeking the following information under the Freedom of Information Act 2000:

I would like to request the datasets from six opinion polls conducted in Iraq for which BBC was a sponsor. I list them below together with links that may be helpful. The list is taken from the web site of ABC news but the BBC is a sponsor on all these polls and must have the original datasets. I want to be clear that I am asking for the detailed datasets, not just tables of processed results. If it isuseful I could send a similar dataset. But what I’m asking for should be the form in which the contractor provided the data to the BBC in the first place.

Thank you very much for your cooperation.

Here is the list:
2009
Field dates: Feb. 17 – 25, 2009
Details: 2,228 interviews via 446 sampling points, oversamples in Anbar province, Basra city, Kirkuk city,
Mosul and Sadr City in Baghdad.
Media partners: ABC/BBC/NHK
Field work: D3 Systems of Vienna, Va., and KA Research Ltd. of Istanbul
Analysis
Interviewer journal
Photo slideshow
Chart slideshow
PDF with full questionnaire
2008
Field dates: Feb. 12 – 20, 2008
Details: 2,228 interviews via 461 sampling points, oversamples in Anbar province, Basra city, Kirkuk city, Mosul and Sadr City in Baghdad. Media partners: ABC/BBC/ARD/NHK Field work: D3 Systems of Vienna, Va., and KA Research Ltd. of Istanbul Analysis Interviewer journal Photo slideshow Chart slideshow PDF with full questionnaire

2007

Field dates: Aug. 17-24, 2007 Details: 2,212 interviews via 457 sampling points, oversamples in Anbar province, Basra city, Kirkuk city and Sadr City in Baghdad Media partners: ABC/BBC/NHK Field work: D3 Systems of Vienna, Va., and KA Research Ltd. of Istanbul, Turkey. Analysis Interviewer journal Photo slideshow Chart slideshow PDF with full questionnaire

2007

Field dates: Feb. 25-March 5, 2007 Details: 2,212 interviews via 458 sampling points, oversamples in Anbar province, Basra city, Kirkuk city and Sadr City in Baghdad Media partners: ABC/USA Today/BBC/ARD Field work: D3 Systems of Vienna, Va., and KA Research Ltd. of Istanbul Analysis Interviewer journal and here. Photo slideshow PDF with full questionnaire

2005

Field dates: Oct. 8-Nov. 22, 2005 Details: 1,711 interviews via 135 sampling points, oversample in Anbar province Media partners: ABC/BBC/NHK/Time/Der Spiegel Field work: Oxford Research International Analysis Photo slideshow PDF with full questionnaire 2004 Field dates: Feb. 9-28, 2004 Details: 2,737 interviews via 223 sampling points Media partners: ABC/BBC/NHK/ARD Field work: Oxford Research International PDF with full questionnaire Photo slideshow
The information you have requested is excluded from the Act because it is held for the purposes of ‘journalism, art or literature.’ The BBC is therefore not obliged to provide this information to you and will not be doing so on this occasion. Part VI of Schedule 1 to FOIA provides that information held by the BBC and the other public service broadcasters is only covered by the Act if it is held for ‘purposes other than those of journalism, art or literature”. The BBC is not required to supply information held for the purposes of creating the BBC’s output or information that supports and is closely associated with these creative activities.1
The limited application of the Act to public service broadcasters was to protect freedom of expression and the rights of the media under Article 10 European Convention on Human Rights (“ECHR”). The BBC, as a media organisation, is under a duty to impart information and ideas on all matters of public interest and the importance of this function has been recognised by the European Court of Human Rights. Maintaining our editorial independence is a crucial factor in enabling the media to fulfil this function.
That said, the BBC makes a huge range of information available about our programmes and content on bbc.co.uk. We also proactively publish information covered by the Act on our publication scheme and regularly handle requests for information under the Act.

Appeal Rights
The BBC does not offer an internal review when the information requested is not covered by the Act. If you disagree with our decision you can appeal to the Information Commissioner. The contact details are: Information Commissioner’s Office, Wycliffe House, Water Lane, Wilmslow SK9 5AF. Tel: 0303 123 1113 (local rate) or 01625 545 745 (national rate) or see https://ww.ico.org.uk/ .
Please note that should the Information Commissioner’s Office decide that the Act does cover this information, exemptions under the Act might then apply.

Yours sincerely,
BBC Information Rights
1 For more information about how the Act applies to the BBC please see the enclosure which follows this letter. Please note that this guidance is not intended to be a comprehensive legal interpretation of how the Act applies to the BBC.

Freedom of Information
From January 2005 the Freedom of Information (FOI) Act 2000 gives a general right of access to all types of recorded information held by public authorities. The Act also sets out exemptions from that right and places a number of obligations on public authorities. The term “public authority” is defined in the Act; it includes all public bodies and government departments in the UK. The BBC, Channel 4, S4C and MG Alba are the only broadcasting organisations covered by the Act.

Application to the BBC
The BBC has a long tradition of making information available and accessible. It seeks to be open and accountable and already provides the public with a great deal of information about its activities. BBC Audience Services operates 24 hours a day, seven days a week handling telephone and written comments and queries, and the BBC’s website bbc.co.uk provides an extensive online information resource.
It is important to bear this in mind when considering the Freedom of Information Act and how it applies to the BBC. The Act does not apply to the BBC in the way it does to most public authorities in one significant respect. It recognises the different position of the BBC (as well as Channel 4 and S4C) by saying that it covers information “held for purposes other than those of journalism, art or literature”. This means the Act does not apply to information held for the purposes of creating the BBC’s output (TV, radio, online etc), or information that supports and is closely associated with these creative activities.
A great deal of information within this category is currently available from the BBC and will continue to be so. If this is the type of information you are looking for, you can check whether it is available on the BBC’s website bbc.co.uk or contact BBC Audience Services.
The Act does apply to all of the other information we hold about the management and running of the BBC.

The BBC
The BBC’s aim is to enrich people’s lives with great programmes and services that inform, educate and entertain. It broadcasts radio and television programmes on analogue and digital services in the UK. It delivers interactive services across the web, television and mobile devices. The BBC’s online service is one of Europe’s most widely visited content sites. Around the world, international multimedia broadcaster BBC World Service delivers a wide range of language and regional services on radio, TV, online and via wireless handheld devices, together with BBC World News, the commercially-funded international news and information television channel.
The BBC’s remit as a public service broadcaster is defined in the BBC Charter and Agreement. It is the responsibility of the BBC Trust (the sovereign body within the BBC) to ensure that the organisation delivers against this remit by setting key objectives, approving strategy and policy, and monitoring and assessing performance. The Trustees also safeguard the BBC’s independence and ensure the Corporation is accountable to its audiences and to Parliament.
Day-to-day operations are run by the Director-General and his senior management team, the Executive Board. All BBC output in the UK is funded by an annual Licence Fee. This is determined and regularly reviewed by Parliament. Each year, the BBC publishes an Annual Report & Accounts, and reports to Parliament on how it has delivered against its public service remit.

Secret Data Sunday – Gary Langer Edition

Last Sunday I shared an unanswered email I had sent to the Senior Vice President for Editorial Quality at ABC news.  The email gives a self-contained account of the overall context behind my data request, but I’ll take another pass here just to be as clear as possible.

There were a remarkable number of opinion polls conducted in Iraq during the US occupation.  Many of these were fielded by D3 Systems working with KA Research Limited.  Steve Koczela and I analyzed some of these surveys and found extensive evidence of fabricated data.  We wrote up our findings and asked for comments from interested parties.  D3 and Langer Research Associates then threatened to sue us rather than constructively engaging.  (See this, this and this.)

It’s clear that Langer Research Associates reacted so furiously because Gary Langer did a series of D3-KA Iraq polls for ABC  that won an Emmy Award plus the Policy Impact Award from the American Association for Public Opinion Research.  So he has a lot at stake.

Moreover, the write ups of these ABC polls show that the ABC data display some of the same patterns that Steve and I found in other D3-KA-Iraq polls.  One of the big ones is  opinion unanimity in certain governorates, including Anbar, that is more characteristic of robots than it is of human beings.  With this in mind, check out the highlighted text below.

^2284C743C86CC164FCB2B2EF819738398CF6E4E396A18B028B^pimgpsh_fullsize_distr

^E277C881426EB61DB031A34F3791226CA4761A05985A3642E9^pimgpsh_fullsize_distr

Given this background it is, perhaps, not surprising that D3 and Langer went for a legal choke-slam rather than for serious discussion.  Nevertheless, it is disappointing that these research organizations place so little value on the truth.  Thus, there really must be an outside examination of the micro data from ABC’s public opinion polling in Iraq.

I requested the data from Mathew Warshaw of D3 Systems.  He directed me to ABC News.  But, as we know, ABC News ignored my data request.  I also tried Gary Langer who  ignored me at first but finally wrote back on my latest attempt.

This is what I wrote to Langer.

Gary,

This is an opportune moment to renew my data request for the surveys you conducted in Iraq using D3 Systems and KA Research Limited.  You did not reply to my last request.

You abdigate your responsibility to the truth and violate principles of transparency by hiding your data and trying to shut down discussion of your work.

Mike Spagat

This is his reply.

Jeez, you really know how to sweet talk a guy, don’t you?

Extra points for “abdigate.”

OK, I accept full responsibility for misspelling abdicate…..abdicate, abdicate, abdicate, abdigate  gah! dammit….

I’m less apologetic about not being sweeter about my request.  Maybe being sweet is better than not being sweet but, in the end, he should live up to his responsibilities whether or not people talk to him sweetly.

Strangely this isn’t the end of the story but you’ll have to come back next Sunday for more.

Secret Data Sunday – ABC News (in the US) Stonewalls over their Dubious Iraq Public Opinion Polls

Below is an email that I sent to Kerry Smith, the Senior Vice President for Editorial Quality at ABC news, back in November of 2016.

She did not reply..

 

Dear Ms. Smith,

I am a professor of economics specialized in the quantitative analysis of armed conflict.  I have a big body of work focused on data quality issues that arise during data collection in conflict zones, especially survey data.

Back in 2011 I wrote a paper with Steven Koczela, now a prominent pollster with MassINC Polling, that uncovered substantial evidence of fabricated data in polls fielded in Iraq by D3 Systems.  We sent our paper to various interested parties for comments, including Mathew Warshaw of D3 Systems and Gary Langer who had just moved from ABC to found Langer Associates.  We included Mr. Langer in the circulation list because ABC news had used D3 Systems for a series of polls in Iraq that now required urgent re-evaluation.

D3, backed by Langer Associates, responded by threatening to sue me and Mr. Koczela.  See this, this and this.   My university has supported me against this censorship attempt but, unfortunately, Mr. Koczela felt that he could not defend himself and signed an agreement to keep his mouth shut about this particular piece of work.  (This why only my name appears on the first link above.)  Eventually, the legal threat disappeared when I wrote to Mr. Warshaw asking him explain what, specifically, he objected to in our analysis.  He did not reply.

To his credit Mr. Koczela continued working on this issue, unearthing a large number of datasets for opinion polls conducted in Iraq by D3 Systems and other polling companies.  These have provided remarkably strong evidence of data fabrication already.  For example, see this eye-popping analysis.

Many of the D3 Iraq surveys that I now have were conducted for the US State Department.  Mr. Koczela made the State Department aware of the problem at some point and they hired Fritz Scheuren, a former president of the American Statistical Association to investigate.  His analysis confirmed the fabrication problem using an analysis rather different from mine.  Unfortunately, Dr. Scheuren signed a nondisclosure agreement but I believe he would confirm in general terms the main gist of this work and he could also give you an authoritative opinion on my analysis.  (scheuren@aol.com)

Notice that after the Huffington Post article Langer Associates did post a response to my 2011 paper.   This is, however, exceptionally weak as I explain in these articles.  Langer Associates have not addressed the new evidence that has emerged since Mr Koczela’s FOIA either.

I emailed Mr. Langer for the data from the ABC Iraq polls but he did not reply.  I asked Mr. Warshaw for the same data and he referred me to ABC news.  I am now requesting the data from you.

 At the risk of belabouring the obvious, I note that people with strong intellectual cases to make do not start by threatening to sue and finish by withholding their data.

Most importantly, ABC needs to take action to correct the historical record of the Iraq war.  These polling numbers are all over the web sites of ABC news and its partner organizations in these polls.  This work must be retracted.

It is, of course, your journalistic obligation to correct the historical record but, at the same time, I think it’s to your advantage to do so.  Fixing this problem would demonstrate a strong commitment to quality and accuracy.  I doubt you would even lose your Emmy Award.  Surely you won’t be punished for pursuing the truth wherever it leads.  I will do anything I can to help in this regard.

I suggest that we meet to discuss these issues further.  I would be happy to fly to New York at my own expense for this purpose.  Alternatively, we could talk by phone, skype or some other technology.

Sincerely,

 

Professor Michael Spagat

Head of Department

Department of Economics

Royal Holloway College

University of London

Egham, Surrey TW20 0EX

United Kingdom

m.spagat@rhul.ac.uk

+44 1784 414001 (W)

+44 1784 439534 (F)

 

Blog:  https://mikespagat.wordpress.com/

War, Numbers and Human Losses: The Truth Counts

Secret Data Sunday – International Rescue Committee Edition

I haven’t posted for a while on this subject so here’s some background.

The International Rescue Committee (IRC) did a series of surveys in the Democratic Republic of Congo (DRC).  The final installment summed up the IRC findings as follows:

Based on the results of the five IRC studies, we now estimate that 5.4 million excess deaths have occurred between August 1998 and April 2007. An estimated 2.1 million of those deaths have occurred since the formal end of war in 2002.

The IRC’s estimate of 5.4 million excess deaths received massive publicity, some of it critical, but journalists and scholars have mostly taken the IRC claim at face value.  The IRC work had substantial methodological flaws that were exposed in detail in the Human Security Report and you should definitely have a look if you haven’t seen this critique. But I won’t rehash all these issues in the present blog post.  Instead, I will just discuss data.

One of the main clouds hanging over the IRC work is the fact that three other surveys find child mortality rates to be steadily falling during the period when the IRC claims there was a massive spike in these rates.  (See this post and this post for more information.)  In particular, there are two DHS surveys and a MICS survey that strongly contradict the IRC claims.

And guess what?

The DHS and MICS data are publicly available but the IRC hides its data.

As always, I don’t draw the conclusion of data hiding lightly but, rather, I’ve tried pretty hard to persuade the relevant actors to come clean.

Frankly, I don’t think I’m under any obligation to make all these efforts.  I haven’t sent any emails to the DHS or MICS people because there’s no need to bother, given that their data are free for the taking.  But the IRC hasn’t posted their data so I resorted to emails.

I wrote multiple times over many months with no success to Ben Coghlan of the Burnet Institute in Australia.  He led the last two rounds of the IRC research, including an academic publication in the Lancet, so he was a sensible starting point.

In the end, it would have been better if Coghlan had just done a Taleb and told me to “fuck off” straight away rather than stringing me along.  First he asked what I wanted to do with the data.  I feel that this is not an appropriate questions since data access shouldn’t really depend plans.  But I told him that I wanted to get to the bottom of why the IRC data were so inconsistent with the other data.  After prompting, he said he needed to delay because he was just finishing his PhD.  I made the obvious reply, pointing out that even while completing a PhD he should still be able to spare ten minutes to send a dataset.  On my next prompt he replied by asking me, rather disingenuously I thought,  how my project was getting on.  I replied that I hadn’t been able to get out of the starting block because he hadn’t sent me any data.  I gave up after two more prompts.

Next I tried Jeannie Annan, the Senior Director of Research and Evaluation at the IRC.  She replied that she didn’t have the data and that I should try …..Ben Coghlan and Les Roberts who led the early rounds of the surveys.

I knew that Les Roberts would never cough up the data (too long a story for this blog post) but wrote him anyway.  He didn’t reply.

I wrote back to Jeannie Annan saying that both Coghlan and Roberts were uncooperative but that, ultimately, this is IRC work and that the IRC needs to take responsibility for it. In my view:

  1. The IRC should have the data if they stand behind their work
  2. If the IRC doesn’t have the data then they should insist that Roberts and Coghlan hand it over.
  3. If Roberts and Coghlan refuse to provide them with the data then the IRC should retract the work.

She didn’t reply.

Here’s where this unfortunate situation stands.

The IRC estimate of 5.4 million excess deaths in the DRC exerts a big influence on the conflict field and on the perceptions of the general public.  It is widely, but erroneously, believed that this DRC conflict has been the deadliest since World War 2.  The IRC estimate survives largely as conventional wisdom, despite the critique of the Human Security Report.

The IRC and the academics involved keep their data well hidden,  choking off further discussion.

PS – Note that this is not only a tale of an NGO that doesn’t uphold scientific standards – there are also academics involved.  I say this because last week at least one person commented that, although Taleb’s behavior is appalling, he’s not really an academic.

 

Secret Data Sunday – Nassim Nicholas Taleb Edition

When data are central to scientific discussions, as is typically the case, then the relevant data should be open to all.

OK, we don’t have to be totally rigid about this.  People may sink a lot of effort into building a data set so it’s reasonable for data builders to milk their data monopoly for some grace period.  In my opinion, you get one publication.  Then you put your data into the public domain.

And public domain means public domain.  It’s not OK to hide your data from people you don’t like, from people you think are incompetent, from people you suspect of having engaged in acts of moral turpitude, etc..  You post your data so everyone can have them.

If you put your data into the public domain and someone does something stupid with it then it’s fine to say that.  It’s a virtue to be nice but being nice isn’t a requirement.  But as far as I’m concerned you share your data or you’re not doing science.

Readers of the blog should be well aware that there has been a dispute about the decline of war (or not), primarily between Steven Pinker and Nassim Nicholas Taleb.  You can track my participation in this debate from a bunch of my blog entries and the links they contain.  I’m in the middle of preparing a conference talk on this subject, and I’ll post the slides later this week….so more is coming.

I planned a little data work to support the talk so I emailed Taleb asking him for the data he used to launch his attack on Pinker’s work.  Here is his reply.

1) It is not professional to publish a “flaw” without first contacting the authors. You did it twice.

2) Your 2 nitpicking “flaws” betrayed total ignorance of the subject.

So I will ask you to fuck off.

He is referring to this post (which did contain an error that I corrected after a reader pointed it out.)

What can I say?

The main thing is that if he wants to do science then it’s not OK to just declare someone to be ignorant and withhold data.

Beyond that I’d say that if he still objects to something in my post he should be specific, either in the comments or to me directly.  As always, I’ll issue a correction or clarification if I get something wrong.

Third, it isn’t really standard to clear in advance criticisms of someone’s work with the person being criticized.  Doing this could be a reasonable strategy in some cases.  And it’s reasonable to send criticism to the person being criticized.  Correcting errors, as I do, is essential.

Anyway, I take away from this episode that Taleb isn’t doing science and also that he probably doesn’t have great confidence in his work on this subject or else he wouldn’t hide his data.