Here’s the film you’re looking for.
Also, after being lazy for more than a month I’ve started uploading the remaining Iraq opinion polls at our old friend, the conflict data page.
The UK House of Lords has issued a call for evidence on the effects of political polling and digital digital media on politics. Submissions are due next week so maybe someone out there wants to dash something off….or maybe someone would be so kind as to give me feedback on my proposal. Below I give a draft.
Note that everything I say applies equally to political polling in the US and around the globe but, quite reasonably, the Lords ask about British polling so the proposal is written about British polling.
(OK, the proposal is about election polling, not war. But this post is very much in keeping with the open data theme of the blog so I believe it will be of general interest to my readers.)
I have one specific suggestion that could, if implemented, substantially improve political life in the UK; require collectors of political polling data to release their detailed micro datasets into the public domain.
A Preliminary Clarification
Some readers may think, wrongly, that pollsters generally do provide detailed micro datasets already. Occasionally they do. But normally they just publish summary tables, while withholding the interview-by-interview results (anonymized, of course). Researchers need such detailed data to make valid estimates.
Let me develop this idea in steps.
Political pollsters face two main challenges. First, they cannot draw well-behaved random samples of voters for their polls. This is mainly because most people selected for interviews refuse to participate. Moreover, the political views of the refusers differ systematically from those of the participants. Second, it is difficult to predict which poll participants will turn out to vote. Yet good election prediction relies on good turnout prediction.
These two challenges dictate that political polling datasets cannot simply interpret themselves. Rather, pollsters must use their knowledge, experience, intuition, wisdom and other wiles to model their way out of the shortcomings of their data. There now exists a growing array of techniques that can be deployed to address political polling challenges. But good applications of these techniques embody substantial elements of professional judgment, about which experts disagree.
This New York Times article leaves little doubt about the point of the last paragraph. The NYT gave the detailed micro data from one of their proprietary Trump-Clinton polls to four analytical teams and asked for projections. The results ranged from Clinton +4 to Trump +1. These are all valid estimates made by serious professionals. Yet they differ quite substantively because the teams differ in some of their key judgments.
The key point is that for the foreseeable future there will not be one correct analytical technique that, if applied properly, will always lead to a correct treatment of new polling data. Rather, there will be a useful range of valid analyses that can be made from any political polling dataset.
Presently we are robbed of all but one analysis of most political polling datasets that are collected in the UK. This is because polling data are held privately and never released into the public domain. This data black hole wastes opportunities in two distinct directions. First, we cannot learn as much as possible about the state of public opinion during elections. Second, by limiting the range of experimentation that is applied to each dataset we retard the development process for improving our analytical techniques.
An Important Caveat
Much political polling data are collected by private companies that must make a profit on their investment. These organizations might feel threatened by this open data proposal. However, these concerns can easily be addressed by allowing an appropriate interval of time for data collectors to monopolize their datasets. This could work much in the way that patents are issued to provide creative incentives for inventors by giving inventors a window of time to reap high rewards before their inventions can be copied by competitors. The only difference here is that these monopolization intervals for pollsters should be much shorter than they are for patent intervals, probably only two weeks or so.
Parliament Should Defend the Public Interest
There is a strong public interest in making full use of political polling data. Yet even public organizations like the BBC collect political polling data (although not in the 2017 election), write up general summaries and then consign their detailed micro data into oblivion. If public organizations cannot be convinced to do a better job of serving the public interest then they should be forced to do so. Even private companies should be forced, by legislation if necessary, to place their political polling data into the public domain after they have been allowed a decent interval designed to feed their bottom lines.
I do not argue that all private survey data should be released to the public. There must be a public interest test that has to be satisfied before public release can be mandated. This test would not be satisfied for most privately collected survey data. But election polling does meet this public interest standard and should be claimed as a public resource to benefit everyone in the UK.
 I urge the committee to consult with the leadership of the Royal Statistical Society on this question. I have not coordinated my submission with them but I believe that they would back it.
 The official NYT estimate was Clinton +1.
Here is a presentation I gave a few weeks ago on fabrication in survey data.
It includes some staple material from the blog but, mainly, I set off in a new direction – trying to explain why survey data get fabricated in the first place.
While writing the presentation I realized that these conditions are similar to those that led to the Grenfell Tower fire. I only hint at these connections in the presentation but I plan to pursue this angle in the future.