Skip to main content

How statistics can be right but still misleading

We are bombarded with statistics all the time in the news, on social media, from government and science. Sometimes they are very useful. At other times they are simply wrong. But there's an in-between way to use statistics - deliberately or otherwise - that is both accurate and misleading.

I just want to give two examples, though there are many more out there. If you want to find out more about the use and misuse of statistics, I'd recommend my book Dice World on the impact of randomness, probability and statistics on our lives and David Spiegelhalter's book The Art of Statistics to get an introduction to how statistics are created, used and misused.

The first example is a deliberate attempt to mislead. The graphic at the top right has been circulated on Facebook. The idea is the this demonstrates the problem with Brexit by showing how important the EU is to us an export market. It doesn't matter if you agree or disagree with Brexit, the issue here is how the numbers are being used. I'd say there are three distortions here. The first is that it's perfectly possible to have Brexit and not to damage EU exports. Secondly, the numbers are bizarrely stated in US dollars - the only reason I can think for this is that it makes the EU amount seem bigger. Finally, there's the matter of the dog that didn't bark. Because around $350 billion of non-EU exports is missing. Bear in mind, I'm not saying the numbers are wrong - the EU is a hugely important market for the UK. But to omit around the same amount of non-EU exports as there are EU exports could only have been done to make the EU seem more important than it really is. That's bad, and clearly deliberate.

More subtle is something that happened when it was announced that eating bacon increased your risk of bowel cancer. We saw headlines like this in the Guardian, pointing out that eating a couple of rashers of bacon a day 'would raise the risk of getting bowel cancer by 18% over a lifetime.' This is true, but with a huge proviso. That 'risk', which sounds terrifyingly huge, is a relative risk, not an absolute one. It's the increase in risk in what is a relatively low risk overall. If you turn that into an absolute risk - which is what most readers would expect - i.e. if I eat bacon every day what is the chance that it will give me cancer over my lifetime, the risk is not 18% but 1%. That feels rather different. It's still a risk - it's still important to know. But it's far more meaningful than the relative risk. Again, the statistics are accurate (though their interpretation may not be: former University College pharmacology professor David Colquhoun has argued strongly from earlier versions of the data that the interpretation is suspect because there is only weak evidence of causality) - but the way it is presented (either intentionally or accidentally) is highly misleading.

No one is saying we should ignore statistics. But we need to be careful about taking what we read in the papers (or even more so on social media) at face value. At the very least, if it's not possible to dig down and see where the numbers come from we should be highly suspicious.


Popular posts from this blog

Why I hate opera

If I'm honest, the title of this post is an exaggeration to make a point. I don't really hate opera. There are a couple of operas - notably Monteverdi's Incoranazione di Poppea and Purcell's Dido & Aeneas - that I quite like. But what I do find truly sickening is the reverence with which opera is treated, as if it were some particularly great art form. Nowhere was this more obvious than in ITV's recent gut-wrenchingly awful series Pop Star to Opera Star , where the likes of Alan Tichmarsh treated the real opera singers as if they were fragile pieces on Antiques Roadshow, and the music as if it were a gift of the gods. In my opinion - and I know not everyone agrees - opera is: Mediocre music Melodramatic plots Amateurishly hammy acting A forced and unpleasant singing style Ridiculously over-supported by public funds I won't even bother to go into any detail on the plots and the acting - this is just self-evident. But the other aspects need some ex

Is 5x3 the same as 3x5?

The Internet has gone mildly bonkers over a child in America who was marked down in a test because when asked to work out 5x3 by repeated addition he/she used 5+5+5 instead of 3+3+3+3+3. Those who support the teacher say that 5x3 means 'five lots of 3' where the complainants say that 'times' is commutative (reversible) so the distinction is meaningless as 5x3 and 3x5 are indistinguishable. It's certainly true that not all mathematical operations are commutative. I think we are all comfortable that 5-3 is not the same as 3-5.  However. This not true of multiplication (of numbers). And so if there is to be any distinction, it has to be in the use of English to interpret the 'x' sign. Unfortunately, even here there is no logical way of coming up with a definitive answer. I suspect most primary school teachers would expands 'times' as 'lots of' as mentioned above. So we get 5 x 3 as '5 lots of 3'. Unfortunately that only wor

Which idiot came up with percentage-based gradient signs

Rant warning: the contents of this post could sound like something produced by UKIP. I wish to make it clear that I do not in any way support or endorse that political party. In fact it gives me the creeps. Once upon a time, the signs for a steep hill on British roads displayed the gradient in a simple, easy-to-understand form. If the hill went up, say, one yard for every three yards forward it said '1 in 3'. Then some bureaucrat came along and decided that it would be a good idea to state the slope as a percentage. So now the sign for (say) a 1 in 10 slope says 10% (I think). That 'I think' is because the percentage-based slope is so unnatural. There are two ways we conventionally measure slopes. Either on X/Y coordiates (as in 1 in 4) or using degrees - say at a 15° angle. We don't measure them in percentages. It's easy to visualize a 1 in 3 slope, or a 30 degree angle. Much less obvious what a 33.333 recurring percent slope is. And what's a 100% slope