Confidence intervals

A confidence interval allows the analyst to replace a single value by a set of likely ones and thus assess properly the precision of a statistical estimate. We tell you here how this actionable tool is working.

The unbearable lightness of the single figure.

Some headlines of the day of this writing (early April 2014):

– 41% of French people trust the new PM, Manuel Valls,

– In Lebanon, Syrian refugees now account for 25% of the population,

– Households’ confidence surged a little bit ahead in February, with a 3 points increase of the synthetic index.

Lots of exact figures, conveying a reassuring feeling of accuracy, while all of them are derived from an estimation process, and thus include a random component. .

Replacing a single figure by a range of likely figures is thus a promising idea: by allowing the public to assess the probable error margin, the delivered information is richer, more nuanced and at the end more useful.

For example, the first sentence of the French Statistical Office (INSEE)’s communiqué published end of March reads: « in March, household confidence is increasing ». But is it quite possible that the 3 points increase is not significant and only comes from standard random fluctuations in a sampling process. Stating that households’ confidence is increasing or is stable is not at all the same piece of information: hence the importance and actionability of confidence intervals.

In statistical terms, we need to replace a point estimate (a single figure) by a range estimate (a confidence interval).

The three ingredients of a confidence interval.

In this article, we suppose we have at least 100 respondents. A statistician would say we are working asymptotically. Below this threshold anyway, statistics doesn’t have much to tell. We will also assume the reference population is much bigger than the sample from which we have calculated our figure of interest.

We need three ingredients to compute a confidence interval:

– The point estimate around which we want to compute the interval,

– The variance of that point estimate: as it is derived from an estimation process, it can be associated to a variance, which measures the precision of that process. More precisely, we need the square root of the variance, i.e. the standard deviation,

– A confidence level, which measures the probability that the true value we seek to estimate is in the confidence interval: thus, the probability of being right/wrong.

Once you have these three ingredients, the calculation is very simple.

Confidence intervals are always computed in the same way:

Point estimation +/- 1,96 standard deviation

The 1,96 coefficient is used for a 95% confidence interval. Other values would be used for other confidence levels. We will come back to that.

Suppose for example that the variance of the household confidence 3 points increase is equal to 4 (we explain here why this value might be possible). The standard deviation is thus 2.

With a 95% confidence level, the confidence interval around the measured single figure 3 is:

(3-2*1,96=-0,92) [-0,92 ; 6,92] (3+3*1,96=6,92)

The confidence level is interpreted like this: if we were to ask everybody, we would measure the true value of the households’ confidence variation. But we don’t ask everybody, only 2000 of them. Deriving from a sampling process, a 95% confidence interval includes this true value with probability 95%.

In the matter at hand, 0 is in the confidence interval. Thus, if the assumption on the variance proved to be correct, the available data would not fully support the idea that households’ confidence has indeed increased. A statistician would say that he cannot reject the hypothesis that households’ confidence remained stable, with a 5% risk of being mistaken.

Confidence intervals around a percentage.

When dealing with a percentage, computing the confidence interval is simpler. Indeed, the variance of an estimated percentage can directly be calculated from that percentage. If p is the estimated percentage and we have N respondents, the variance is equal to p*(1-p)/N. The below table gives a few examples of the computation of the standard deviation, and thus of the 95% confidence interval, around a percentage for a sample of 1000 respondents:

Percentage	Standard deviation	95% confidence interval
10%	0,009	[9,1% ;11,9%]
30%	0,014	[ 27,2%;32,8%]
45%	0,016	[41,9%;48,1%]
50%	0,016	[46,9%;53,1%]
55%	0,016	[51,9% ;58,1%]
70%	0,014	[67,2% ;72,8%]
90%	0,009	[89,1% ;[91,9%]

This is how error margins are calculated for commercial polls (see the article on electoral polls). As can be seen from the table, results are symmetrical around 50%: the confidence interval for an estimated percentage of 30% is identical to the one for 70%.

How to get shooter confidence intervals?

Getting the shortest possible confidence intervals will always be the objective of any data collection and analysis process. What can we do about that?

Two dimensions can be acted upon: the standard deviation of the estimate and the confidence level of the interval:

– The simplest way to decrease the standard deviation is to increase the number of cases (the sample size in the context of a survey). This can be seen directly from the above formula on percentages, and it is the same for any type of data. Nothing better that having more data to get more precision…

o A well thought about sampling scheme can possibly help reduce the standard deviation: the variance can be significantly decreased by stratifying the sample. This will however mainly work for quantities (number of products bought, income…) rather than percentages.

– Decreasing the confidence level also allows you to shorten the confidence interval. Let us keep in mind that the confidence level is a measure of the probability that the true value is in the interval. The larger the interval, the greater this probability (the interval spans over more values). Reducing the confidence level will decrease the length of the interval, but will also increase the risk that the true value is not in the interval: this is the price to pay to be more interesting.

The table below gives the limits of 90%, 95% and 99% confidence intervals, around the increase of household confidence we discussed above (we stick to the assumption that the variance is equal to 4).

Level	Coefficient	Lower limit	Upper limit
90%	1,64	-0,28	6,28
95%	1,96	-0,92	6,92
99%	2,58	-2,16	8,16

When the level is increasing, the interval lengthens: there is more chance the true value is covered. But the conveyed information is fuzzier and less useful.

A last note on the matter of whether the supposed increase of the households’ confidence: even at 90%, it is not significant.

Should a confidence interval be symmetrical?

Lots of confidence intervals with the same confidence level can be built around the same measured point value. All confidence intervals below have a 95% probability of encompassing the true value (same example of the supposed increase of French households’ confidence in March 2014):

[-0,92 ; 6,92] [-0,28 ; +∞[ ]-∞ ; 6,28] [-1,66 ; 6,52] [-0,52 ; 7,66]

Only the first interval is symmetrical around the measured point value (3). That interval has a nice property: it is the shortest one among all possible confidence intervals with the same confidence level.

This is why confidence intervals are always symmetrical: symmetry ensures that the interval has the shortest length. There is also a statistical rational for choosing symmetrical confidence intervals, as is explained in the article on statistical tests.

Statistical tests and confidence intervals

Questions over the precision of a statistical measurement often boil down to performing a statistical test: is the March 2014 increase of households’ confidence real or just a statistical artefact? Was the gap between the two candidates, as given by polls on the eve of the last presidential election, statistically significant? Is the memory of my advertising campaign significantly higher than with other similar campaigns, i.e. is my campaign efficient?

A statistical test can straightforwardly be interpreted in terms of confidence intervals: the two concepts are equivalent. Understanding what is a confidence interval being rather easy, the duality confidence intervals/statistical tests help better grasp the latter concept, which can at first sight seem rather complex. See the article on statitical tests for more details.

You turn

PSG won the first leg of its 2014 Champions’ League quarter final against Chelsea 3-1. Pundits tell us that the probability of PSG going to the semi-final is 75% (btw, they didn’t). What is the confidence interval around that figure?

There is no confidence interval. The 75% is calculated over all European cups matches. Being calculated on the whole universe, the 75% has no random component: it is a sure thing.
Since 1970, there were 274 games where the team receiving for the second leg had lost 1-3 in the first one. In 75% of these games, the winning team of the first leg made it to the following round. Using the insights from section 3 above, we have p=0,75 and N=274 and thus the confidence interval at 95% is [69,9% ; 80,1%].
The previous calculation needs to be amended to take into account that the total number of games in European cups is not infinite. The ratio between the number of sampled games (274) and the total number (6000) has to be factored in.
The 75% figure is biased. Calculating a confidence interval around a biased figure is non sense.

Related item

Statistical tests

Precision of electoral polls

References

S. Kullback (1959): Information Theory and Statistics – Wiley

T.S. Ferguson (1967) : Mathematical Statistics – Academic Press

J.P. Lecoutre (2012) : Statistique et probabilités – Dunod

A. Monfort (1982) : Cours de statistique mathématique –- Economica

S.D. Silvey (1975) : Statistical inference – Chapman and Hall

Appendix: Calculation of the variance of the March 2014 increase in households’ confidence

The information given by the French statistical office (INSEE) is summarized below:

		2014
	Av.
Synthetic Index	100	85	86	85	88
Personal financial situation – past evolution	–19	–34	–35	–32	–30
Personal financial situation – perspective	–4	–20	–17	–19	–17
Current sparing capacity	8	11	14	10	16
Future sparing capacity	–10	–5	–1	–7	2
Opportunity to spare	18	18	23	21	20
Opportunity to spend	–14	–29	–28	–28	–26
Standard of living – past evolution	–43	–73	–71	–72	–69
Standard of living – perspective	–23	–49	–46	–51	–47
Unemployment – perspective	32	49	53	55	53
Prices – past evolution	–13	–7	–13	–20	–25
Prices – perspective	–34	–17	–16	–24	–30

We only have the balance between positive and negative answers and the sample size (around 2000). The synthetic index is a weighted average of the 11 KPIs, but the weights are not public.

The public does not have thus readily at hand the necessary components to calculate the precision of the published figures. Some assumptions can be made and would lead to the following table:

	March to Feb	Variance	Standard deviation	Different from 0
Personal financial situation – past evolution	2	5,0	2,2	No
Personal financial situation – perspective	2	4,5	2,1	No
Current sparing capacity	6	4,5	2,1	Yes
Future sparing capacity	9	6,0	2,4	Yes
Opportunity to spare	-1	5,0	2,2	No
Opportunity to spend	2	5,0	2,2	No
Standard of living – past evolution	3	3,5	1,9	No
Standard of living – perspective	4	5,0	2,2	No
Unemployment – perspective	-2	5,0	2,2	No
Prices – past evolution	-5	5,0	2,2	Yes
Prices – perspective	-6	5,0	2,2	Yes

In the above table, you can find:

– The variation between February and March of each KPI,

– The variance of that variation. Here we need to make some assumptions: this variance is calculated as the average over various possible sets of positive and negative answers,

– The standard deviation of this variation,

– And the fact that 0 is in the confidence interval, i.e. that the measured variation is statistically significant.

It can be seen that, for 7 KPIs over 11, the variation is not significant. It is significant for 4 KPIs: 2 increases and 2 decreases. The average variance is 5.

It would be possible that the combination of 11 KPIs would give a significantly positive variation, when only two of them show a significantly positive one. This would certainly be the case if the KPIs were not correlated. In that case, the sample size is in some sort leveraged. But in our case, the KPIs are highly correlated. The variance gain, when you aggregate them, is most probably small. This is why a variance of 4 for the index variation seems likely to us. In order to calculate it exactly, we would need to have access to all data.

Statpedia