The rout of British pollsters in the last general election is an embarrassment for the teams involved in it, but not only for them. It contributes to discrediting opinion polls more globally, with a possible spill over effect on the entire work of marketing research companies. Their clients might legitimately wonder about the precision of market share forecasts, or other analyses, coming from teams working daily in the same office as those political pollsters. Even more, the general public could seriously question the role and utility of market research, at a time when the need for data is ubiquitous.
Hence the importance of understanding the reasons for the disaster. Let us hope the ongoing inquiry will be fully transparent. My personal feeling is that, at the moment, the explanations are a bit short of substance.
In order to analyse the possible issues, I started from the blog published by SIG lab. A confusing discussion, that left me a bit puzzled.
Let us start with the beginning and with the data. In the table below, you will find the difference between the actual results and the average of the last polls, just before the election, for the last 6 UK general elections.
Source: BBC, Wikipedia, UK Political info and electoral commission. The score of Irish unionists has been added to those of the Conservatives. Same thing for SDLP and Labour.
At this stage, let us remember those three things:
- If the 1992 election remains the absolute reference in terms of polling failure, 2015 is actually not so different from 2001, or even 2010. In 5 elections out of 6 one of the three main parties is more than 3 points away from the forecast. 2015 only stroked the general public more because the political outcome is very different from the forecasted one.
- In 5 elections out of 6, Labour is over estimated and Conservatives under estimated.
- The global share of small parties does not seem to be more badly estimated recently than 20 years ago.
Which reasons for the disaster, then?
First explanation, rightly rejected by the SIG: the idea that polls are a snapshot at time t, and not a forecast of the vote. That pollsters from various stripes argue along those lines always made me wonder. A nice example of conscientiously cutting the hand that feeds you. If the Monday polls are not used to predict the outcome of the Thursday election, what should they be used for? But let us briefly follow this line of reasoning. The difference between the last polls and the actual outcome would then be due to (i) people who decide at the last minute (ii) people who change their mind at the last minute.
On the first point, we do have exit polls. Their confidence level is higher than those just before Election Day: they clearly anticipated the conservatives’ victory, even if it was a bit under estimated. The verdict is unambiguous: less than 10% of those who decided at the last moment chose the Conservative party. On the second point, as noted by the SIG, the voting intents before the vote are totally flat. 1,5 million people should have changed their mind in the night to Thursday…
Second explanation, which seems to be favoured by the SIG: « electoral changeswhich make the work of pollsters more complex »: fragmentation of the vote, rise of new parties, presence of regional parties. All this would make the remembrance of previous votes more uncertain. Or even impossible to use, because, the SIG tells us, pollsters « don’t have any more any previous election in order to calibrate their results ».
A quick recall on the remembrance of previous votes. This is a crucial variable for political polling. Raw data are often biased: for example, the National Front in France is always well under its actual level in the raw data. The way to remove the bias is to calibrate what interviewees say about their past vote with the actual results.
Let us first cast away the idea that the fragmentation of the vote or the existence of regional parties could be a real obstacle.
The fragmentation is actually quite relative: the two main parties gathered 68,5% of the vote in 2015, 66,4% in 2010, 69,5% in 2005, and, yes, around 75% in previous elections: you can do worse in terms of fragmentation… The monumental error of 1992 happened for the least fragmented election. If we look at the French presidential elections, the largest mistake for mainstream parties in recent elections happened in 2007 (3% under estimation of Nicolas Sarkozy), again the least fragmented election: the two main candidates gathered 57% of the vote, against 56% in 2012 and 37% in 2002.
As for regional parties, I just don’t understand the logic: Irish and Welsh regional parties have the same share of the votes since 1992. The SNP indeed scored quite high, with 4,7% against 1,5% to 2% in elections since 1997. That tide was perfectly anticipated by pollsters, who had the possibility to use recent golden data (the independence referendum) to calibrate their forecast.
The point around new parties is more serious. UKIP score is 12,6% in 2015, vs 3,1% and 2,2% for the two previous elections. The rise of new parties, especially the ones against the system, is a challenging issue for pollsters. Their supporters are less naturally inclined to answer surveys. In France, polls under estimated the score of the National Front candidate in 2012 (-1,8%), over estimated it in 2007 (+3,4%) and under estimated it in 2002 (-3,9%). Yes but. The spread between the forecast and the actual result is smallest precisely for the UKIP…One reason to cheer for the pollsters: well done.
Third explanation, the « shy Tory factor » and the differential turn out across political parties. Conservative voters would be less incline to tell that they vote Tory. They would tell interviewers they vote for another party, or that they don’t vote. And that, in a greater proportion than voters for other parties. This is actually the same phenomenon as the one just discussed for new parties. No need to invent new concepts like the “shy Tory factor”. This is a classic example of endogeneity bias. Non response is correlated with the measured variable and the resulting sample is endogenous. Nobel Prize winner James Heckman wrote on this back in 1979 in his Econometrica’s seminal paper.
This certainly is something to dig in a bit further. At this stage, it is not clear why calibrating on past voting behaviour would not solve the issue: conservative voters would not tell they are voting Tory now, but happily tell it for the past elections? Only a detailed analysis of raw and calibrated data could give some insight in the issue.
The SIG ends its blog by giving its blessing to the Survation company, which pretended after the election they got the right result in their last – coincidentally non published – poll. Grotesque marketing exercise: the last Survation poll published on the eve of the election was, for the Conservatives, the farthest away among all published polls on that day. Forecasting is like marriage. You should speak up before or forever remain silent.
To summarise, let me follow the illustrious path of Jonathan Swift and make a modest proposal. No need to invent new and complicated concepts: the theoretical framework around endogenous sampling is largely documented and can be applied here. Let us not be fooled in thinking that the issue is due to new political landscapes and that it will be solved when this landscape will stabilise, as the SIG would have it. Just looking back at the last 6 elections shows that the issue is permanent since 1992. Except in 2010: this is also what should be understood. Transparency over the calibration methods used for past general elections is key to restore the credibility of UK pollsters. Key also to dispel any doubts about an industry which our democratic society really needs.