Monday, 20 April 2015

What is a representative audience sample?

Poll of polls from BBC website
One of the reasons I wrote Dice World is that I love probability and statistics, so it was fun to see a stats row in the news.

Ukip has been kicking up a fuss over the makeup of the audience in the opposition leaders' debate last week. They say that the BBC (or, to be precise, ICM, who assembled the audience for the BBC) were biassed in favour of left-wing parties, producing the clearly overwhelming anti-Farage sentiment in the audience.

Here is what I've seen reported as the makeup ICM used: about 58 Conservative/ Ukip, 102 for Labour, the Lib Dems, SNP or Plaid Cymru, all arguably parties of the left. And 40 undecided. (This was from a fairly dodgy source, so if anyone can confirm, or has better numbers, please let me know.)

So if we ignore the undecided, that's 36 per cent who have said they will vote in a way that might make them relatively positive to Farage.

So the question is, how can you be representative? There are two significantly different interpretations of what 'representative' means in this context. One is to take the last election, the only true nationwide poll we have, as a starting point, and the other is to take a sample poll as organizations like ICM generally do.

If they had gone for the 2010 election, the Conservatives had 47 per cent of the seats (Ukip, of course, had none) - which sounds a lot more that their representation here, but that just reflects the oddities of the first past the post voting system. If you go on the only relevant figure, the percentage of votes cast, they had 36 per cent of the vote - which means that the proportion was perfect.

So how about asking people now? Based on the latest poll of polls (see above), the Conservatives have around 34 per cent of the vote, which might again make the numbers seem reasonable, were it not for the rise of Ukip. They currently stand at 12 per cent in the poll of polls, so the combined Conservative/Ukip percentage on this basis should have been 46 per cent: on this measure they were under-represented.

There is inevitably some room for subjective choice. Personally speaking, I think the votes from the last generation election (i.e. 36 per cent) is the best starting point. This is because we know general election polling is often well adrift of sampled polls, so these numbers provide the only truly reliable poll, but we do need to bear in mind that it is five years old - and that means it show the position before the rise of Ukip.

However, it would seem odd for ICM to use these figures, as the BBC wouldn't need to bring them in to use the popular vote from the last election. They could do what I did and look up the numbers. So ICM must have (and did) use a poll to decide the proportions, and in those circumstances, it does seem that Ukip has a reasonable claim that the makeup of the audience was non reflective of the UK at large. Here's ICM's explanation of what they did:
A total of 30 small geographical areas (Super Output Areas, as defined by the Office for National Statistics) were selected within a 20-mile radius of the venue. A minimum of 8 people were recruited within each area, in line with both demographic quota variables that reflected the composition of the UK population by gender, age, ethnicity, and social grade, and political protocols that reflected the balance as agreed between the broadcasters and the political parties. One fifth of the total number recruited was on the basis of being a self-defined 'undecided voter'. Separately, a small number of SNP and Plaid Cymru supporters were recruited in Scotland and Wales, using alternative recruitment strategies, reasonably decided upon by ICM. [my italics]
So, in fact, the audience was not representative of the country at all, but just of the location the debate took place, meaning that all bets were off.

Lies, damned lies and statistics, eh?

1 comment:

  1. I have a real beef with election pollsters. I reckon that the samples they choose are just too small to give meaningful results, especially when the results are divided up so finely, and the sample covers people of very varied ages and backgrounds. They tend to ask about 1000 people, who will to some extent be self-selected, given that some people asked will decline to be involved. You can tell the lack of meaning as the pollsters themselves say that the figures may vary by three per cent either way. And I disagree with the contention that a poll of polls is better - meta-analyses like this have their own problems. I'd only attach credibility to a poll of upwards of 10,000 people. And the actual election, of course.