Opinionway andSLPV analytics set up an estimation model for the French Parliamentary elections, which first results were published by Les Echos last week. This modelling is a natural follow up to what we did for the Presidential. election.
Bruno Jeanbart and I started our analysis and developed our thinking on this with the 2015 des local elections. As often with this type of work, the method we are using now is radically different from what we did for the 2015 local and regional elections. The method was stabilised for the LR and PS primaries, end of 2016, beginning of 2017.
For the LR primaries, we tested the modelling by projecting the final results, once the first results were available from polling stations. We assumed that the 2000 smallest polling stations were the first to finish the work. On those, François Fillon was largely ahead (42,3%), when Alain Juppé scored 25,4% and Nicolas Sarkozy 25,1%. This is in line with the results that were published around 8.30pm. Our projection for the final results, based on these early 2000 polling stations, was giving: 41% – 27% – 20%. Thus, at 8.30pm our model was confirming that François Fillon was above 40% and that Nicolas Sarkozy was not making it to the run off.
We tested the model again in the first round of the presidential election. Based on an Opinionway poll done on Sunday 23rd, at 7pm, our model was giving the right hierarchy among the 4 top candidates (including the Fillon/Mélenchon order, but 30 years of practice in statistics tell me this is sheer luck). The mean absolute error was 0,6% when compared to the final result.
The global results of the model are only of limited interest, as the polls were fully in line with the actual outcome. But this convergence lead us to believe that the local projections of the model are also correct.
Let us outline again briefly how the model works:
– First, an econometric modelling of voting intentions, as a function of the characteristics of the interviewees (sex, age, occupation, education, type of housing, owner/renter, region, living in a large town) and of their past voting behaviour (at the 2012 presidential election and possibly the 2015 regional ones).
– A projection of that modelling of all the French « communes » (36 000 of them) using data from the French statistical agency (Insee) and ministry of the interior, available with the French Open Data or Insee’s website.
For the parliamentary elections, we added in the modelling the voting behaviour in the first round of the presidential election: the data were very quickly available on the Open Data. And we also have a specific element: a variable capturing the incumbent effect. Incumbents have an advantage over brand new candidates. I will come back to that.
Not sure if this is Big Data or not, but it involves intense computing. When done on the 36 000 French “communes” (we did that foolishly at the beginning), you need 5 days between the beginning of the modelling process and the final Excel sheet. We did a clustering of “communes”, nested into the parliamentary constituencies. The model is then only estimated on 6 000 clusters. One full day is still needed to get to the results. That said, if the estimation is just done on new polling data, without any change in the explanatory variables of the model, it takes only 5’.
As for any scientific endeavour, I can imagine many critics/improvements to our process. L’Obs, in a recent paper devoted to the Les Echos release, mentions 4 of these. I was not lucky enough to be contacted in order to explain our method. Personally, I am not sure these 4 critics are really relevant. Let us go through them:
– The first two are identical: En Marche ! is fully new and the political landscape is totally upset. Thus it is not possible to forecast the June election. One crucial ingredient for our modelling is the (good) quality of the input data. Polls, in particular Opinioway’s, were fully accurate for the presidential election, when the novelty of En Marche! and the upsetting of the political landscape were already issues. Our forecast for the Parliamentary elections is based on polls very similar to the ones done for the Presidential one. Why reject the former when accepting the latter? We fully agree of course they will be more accurate when all the candidates’ names will be known.
– The end of the possibility to be MP and mayor or have an executive position in a local body. Am not sure I get the objection. Opinionway worked hard on incumbents and singled out almost all of them. We take them into account in our modelling. That said, how to take them into account in the model certainly is to be discussed. I am coming back to that later.
– No possibility to estimate the number of run offs with more than 2 candidates (in French Parliamentary elections, all candidates who got at least 12,5% of registered voters are allowed to make it to the run off. We then have “triangulaires” – 3 candidates – or even “quadranguaires” – 4 candidates). Our assumptions, that can be found in this published paper, are the following: (i) a turnout identical, in each constituency, to the 2012 one (ii) a candidate for each of the big political families: Front de gauche, Parti Socialiste, En Marche !, LR/UDI, FN. We calculate, in each constituency, the score of each candidate, which allows us to then deduct the run off configuration. None of these two assumptions seems to have a decisive impact on the results. There is no rational at this stage to anticipate a bigger turnout than in 2012, which would indeed lead to more « triangulaires » than the relatively small number we forecast. Only a hypothetical global alliance between the Socialist Party and the Front de Gauche (former communists) or between the Socialist Party and En Marche ! could have a significant impact on the results.
Two paths of improvement of our modelling seem to me much more critical and we are going to work on those as a priority before the first round of the Parliamentary election on June 211 :
– As I explained in the blog on the presidential election, two modelling with different explanatory variables give different results. Nothing unusual. The eye of the expert is there to sort out the best model and it worked for the Presidential election. It would be of course more comfortable to have a scientific answer to the issue. The statistician’s standard answer would be to retain in the modelling only the significant variables. The Big Data adept would put all variables in the cooking pot and wait for the truth to emerge. Both these approaches forget about the parsimony principle: the more variables you have in your model, the more chances to have to explain a phenomenon. But the more variance you also will instil, which in turn will negatively impact the predictive power of the model.
– The second point is of course the incumbent effect. This can be assessed quantitatively from past elections. We did it for 4 of them: 1993, 2002, 2007 et 2012. Several specifications were tested (additive or multiplicative impact, effect differentiated according to the geography, …). The effect is also not the same for each year and according to the political family. Technical improvements are still possible and it will be interesting to see if these are able to generate more accurate results.
A last point, on which I will never insist enough. The machine alone is never the sole responsible for the forecast. For example, the incumbent effect is not the same across years. The assessment of the political situation lead us to retain the effect measured in 1993 for the Socialist Party (for which it is much larger on that year than for the other years) and the one measured in 2012 for the other parties. The best model come from this combination of a quantitative approach and of the sectorial expertise.