Who will win the French Presidential Election?

Electoral polls had an « annus horribilis »: they did not predict the “Leave” victory, did not anticipate Trump’s triumph and would not have correctly predicted the results of the French primary season end of 2016/beginning of 2017. Even if the critics are sometimes not fully accurate, or even plainly wrong (I will come back to that in a coming blog), it would be foolish to deny that the Brexit/Trump/French primaries sequence is an embarrassment for the marketing research industry. And you can add to that the disaster of the 2015 English elections.

Just a (small) reassurance, before digging into the matter. Even if it is fashionable to put scorn on electoral polls, the appetite of the general public (and not only the media) is actually intact. The below graph, based on Google Trends data, displays the evolution of the number of searches for “poll” in France over the last 12 months. As often, there is a discrepancy between what people say and what they do…

The main difficulty facing pollsters is well known: it is non-response. Not everybody would accept to participate to a survey. When this non-response is correlated to the variable you want to measure, the resulting sample is biased (statisticians talk about endogenous sampling).
This is exactly what happens with opinion polls. Centrist voters are more likely to answer surveys that fringe ones. In the raw data, Marine Le Pen (president of the far right National Front) is under estimated while Emmanuel Macron (a centrist) is too high. These raw data need to be reweighted.

This nothing shameful about this reweighting. Some authors won the Economics Nobel Prize for the development of methods aiming to handle the endogeneity bias. Research agencies are not using the most sophisticated methods, but what they do is easy to understand and there is nothing bad to say about it from a statistical point of view.

The idea is simple: ask the interviewees whom they voted for at a previous election. The difference between what is being declared at the time of the survey and what truly happened (as measured by the true results of that reference election) allows pollsters to reweight the raw data. This might seem as simple as a rule of three. Suppose Marine Le Pen is at 15% in the raw data for 2017 and that 12% of interviewees tell they voted for her in 2012. Her true 2012 score being 18%, you can calculate her true 2017 score as 15*18/12=22,5.

This is of course an over simplified example. You might want to use several reference elections, with the first and second round. If you add to that the more usual reweighting on socio demographics, you need the expertise of the pollster to get to the right solution. Expertise also means subjectivity. Here again, nothing shocking that a statistical procedure should be combined with an expert opinion to get to a forecast. The only final valid judgment is the consistency between forecast and actual result.

Even is this estimation process is not without theoretical background, its empirical flavour is an issue. Which makes room for conspiracy theorists of all stripes, who even Nate Silver unfortunately echoes in a recent tweet (while a perfectly rational explanation could be given to the phenomenon at hand).

This is why, in collaboration with Opinionway , and particularly Bruno Jeanbart (you will find his blog on the matter here) , we have developed an alternative method for analysing electoral polls data. This alternative method rests on two ingredients:

– First, an econometric modelling of voting intentions, as a function of the characteristics of the interviewees (sex, age, occupation, education, type of housing, owner/renter, region) and of their past voting behaviour (at the 2012 presidential election and possibly the 2015 regional ones). This modelling is done with the data collected by Opinionway for the Presitrack daily rolling poll. Results presented here are based on the data collected form March 30 to April 13 (more than 4000 people being sire they will vote).

– A projection of that modelling of all the French « communes » (36 000 of them) using data from the French statistical agency (Insee) and ministry of the interior, available with the French Open Data or Insee’s website.

The basic principle of this method is very similar to the reweighting of raw data, as described earlier. It actually bring two more elements to the table: (i) it takes into account more reweighting criteria, thanks to the econometric modelling (ii) it allows for a pretty granular geographical projection of the results, based on reference local data.

Interestingly, the subjectivity of the statistician is not fully absent from the process, as in the classical reweighting. Indeed, the modelling gives pretty different results, according to the fact that you put in the modelling the past vote for the 2015 regional elections. The statistical theorist would say this should be in: the variable is very significant in the modelling. The survey specialist, who knows about the fragile recollection capabilities, particularly for an election as little inspiring as the regional one, will have doubts on the robustness of data collected on this particular topic. This being an experimental process, only the final results will tell us what was the best choice. My personal preference goes with the econometric modelling with the 2012 regional vote.

The first interesting output of this work is a projection of the Opinionway poll at the level of départements and regions. You will find herea data visualisation, done with Tableau.

The second important output of the modelling is of course the national estimation. One of the advantages of our procedure is to allow for a probabilisation of the results: which probability for each type of run off.

– In a modelling with the 2012 regional vote, the probabolity of a run off Marine Le Pen/Emmanuel Macron is 67%. Then, a duel Marine Le Pen/François Fillon would have a probability of 32%. The remaining 1% is for a possible Emmanuel Macron/François Fillon.
– If the 2012 regional vote ins not taken into account, things are different: 84% for a run off between Emmanuel Macron and François Fillon, 14% for Marine Le Pen/Emmanuel Macron and just 2% for Marine Le Pen/François Fillon.

Lots of uncertainty then, which is actually reflected in all polls. In parallel, we have started to apply the same procedure to the June elections to parliament: we will come back to that soon in this blog.

Antoine Moreau