The long-awaited report from the American Association for Public Opinion Research (AAPOR) on the performance of polling in the Trump-Clinton race is out. You will see that this material is less of a stretch for the blog than it might seem to be at first glance and I plan a second post on it.
Today I just want to highlight the hidden data issue which rears its head very early in the report:
The committee is composed of scholars of public opinion and survey methodology as well as election polling practitioners. While a number of members were active pollsters during the election, a good share of the academic members were not. This mix was designed to staff the committee both with professionals having access to large volumes of poll data they knew inside and out, and with independent scholars bringing perspectives free from apparent conflicts of interest. The report addresses the following questions:
So on the one hand we have pollsters “having access to large volumes of poll data” and on the other hand we have “independent scholars” who….errr….don’t normally have access to large volumes of polling data because the pollsters normally hide it from them. (I’m not sure what the apparent conflict of interest of the pollsters is but I guess it’s that they might be inclined to cover up errors they may have made in their election forecasts.)
You might well ask how come all these datasets aren’t in the public domain?
Sadly, there is no good answer to that question.
But the reason all these important data remain hidden is pretty obvious. Pollsters don’t want independent analysts to embarrass them by finding flaws in their data or their analysis.
This is a bad reason.
There is a strong public interest in having the data available. The data would help all of us, not just the AAPOR committee, understand what went wrong with polling in the the Trump-Clinton race. The data would also help us learn why Trump won which is clearly an important question.
But we don’t have the data.
I understand that there are valid commercial reasons for holding polling data privately while you sell some stories about it. But a month should be more than sufficient for this purpose.
It is unacceptable to say that sharing requires resources that you don’t have because sharing data just doesn’t require a lot of resources. Yes, I know that I’ve whinged a bit on the blog about sharing all that State Department data and I’m doing it in tranches. Still, this effort is costing me only about 15-30 minutes per dataset. It’s really not a big deal.
I suppose somebody might say that these datasets are collected privately and so it’s OK to permanently keep them private. But election polls drive public discussions and probably affect election outcomes. There is a really strong public interest in disclosure.
There is further material in the report on data openness:
Since provision of microdata is not required by the AAPOR Transparency Initiative, we are particularly grateful to ABC News, CNN, Michigan State University, Monmouth University, and University of Southern California/Los Angeles Times for joining in the scientific spirit of this investigation and providing microdata. We also thank the employers of committee members (Pew Research Center, Marquette University, SurveyMonkey, The Washington Post, and YouGov) for demonstrating this same commitment.
I’ve written before about how AAPOR demands transparency on everything except the main thing you would think of when it comes to survey transparency – showing your data.
I’ll return to this AAPOR problem in a future Secret Data Sunday. But for now I just want to say that the Committee’s appeal to a “scientific spirit” falls flat. Nobody outside the committee can audit the AAPOR report and it will be unnecessarily difficult to further develop lines of inquiry initiated by the report for one simple reason; nobody outside the committee has access to all of the data the committee analyzed. This is not science.
OK, that’s all I want to say today. I’ll return to the main points of the report in a future post.