peterbirks | Reading the runes (Reply)

I have long been fascinated by opinion polls -- indeed, by statistics in general. That I am hopeless at the mathematical side of statistics just adds to my fascination.
Opinion polls seek to guess how people in their millions will act on the basis of relatively small samples. It was immediately obvious that just asking 1,000 people at random in the street would be at risk of generating an erroneous response (although the degree of that likely erroneousness is possibly less than many would think).
Pollsters realized that a good way to increase the accuracy would be to ensure that the sample of 1,000 people reflected as much as possible the population as a whole - age distribution, sex distribution, and so on.
This, however, leads to another problem. Over the years it was discovered that, shock, horror, what people said was not always the same as what they did. Even more concerning, what people really believed was often different from what they did (the famous female claimed belief in what attracts them to a sexual partner/life partner differs drastically from empirical evidence of whom women actually choose). The ways in which questions were phrased also had a significant impact on the response.
Clearly, opinion polling was something of a nightmare. And, given the misperformance of the pollsters leading up to the last general election in the UK, the pollsters still haven't got it right.
So, what is it that they are getting wrong?
The two major problems are the aforementioned "tendency to deceive" (people respond with what they think they ought to say, rather than what they really feel) -- a factor that has been a curse for the intellectual left-wing for decades. These days they flood Twitter and Facebook, demonstrate to their own satisfaction that the argument has been won, and wake up the day after voting to have been told "fuck off". The secret ballot allows visceral emotions to come into play. A person might not vote for a candidate because he or she doesn't like the fact that the candidate is fat. But no respondent to an opinion poll is likely to say that, and no online social media campaign is going to mention "the elephant in the room" if a candidate is 25 stone-plus and female.
The second problem is more complex -- one that is only just coming to be fully appreciated. That is, how do you decide what is a "representative" sample?
In the early days of polling, the techniques were primitive - mainly age and sex. This came most unstuck in 1948 in the US, when a telephone poll predicted that Harry Truman would lose. As seems obvious now, the key was in the phrase "telephone poll". With a market penetration still under 50%, people with a telephone were markedly more likely to be better off, and, therefore, Republican voters.
So, clearly we have to add "income" to our representative mix. In fact, what pollsters need to do is to add any variation in the make-up of the general population that is positively correlated with the way that people are likely to vote.
You can see the problem here. This in itself is something of a judgment call. As it is a sample, the pollsters must by definition filter out "irrelevancies". The problem appears to be that in a dynamic society, some things that used to be relevant have ceased to be so, while other things which did not use to be important, now are.
With the referendum, where "all bets are off" when it comes to traditional party politics, the problem is multiplied. What on earth is "relevant" when it comes to picking a true representative sample, when the split is not along traditional party lines? Also, there appear to be significantly more "elephants in the room" -- things which neither side are prepared to mention, but which could be significant factors when it comes to voting. That in turn feeds back to a higher likelihood of a "propensity to deceive" and a greater danger that the phrasing of the poll question will distort the result from reality.
I'd quite like to see some sample results from randomly asking 200 people each in, say, five streets in England. I suspect that the numbers obtained would not be a long way different from the carefully calculated "representative samples".
In poker I have long argued that you can learn more from small samples than you think. The conventional wisdom in poker is that you can't learn anything from, say, a player's actions over five hands. I argued, way back in the early 2000s, that if this was all that you had to work with, ignoring it was stupid, just because there was a higher probability that the answer you obtained would be wrong. Sure, with five hands the standard deviation is many times higher than it would be on a sample of 50, 500 and 5,000. But it is not TEN times higher than the sample of 50 - it's closer to three. It is not a thousand times higher than a sample of 5,000 -- it's closer to 80.
Sure, the conclusion you reach if the player raises four times and folds once in his or her first five hands might be erroneous. But the probability that this player is loose-aggressive is still significantly higher than it was when you had a sample size of zero.
In other words, completely random samples (and I mean virtually completely -- no self-selection on the basis of sex and age and only a minor one on grounds of geography) might have their place. And they have one plus -- they are much easier, quicker and cheaper to compile.

Peter Kellner, in his blog, referred to an interesting statistc -- that being the percentage of people who see Brexit as a "risk" compared with Remain as "safe". The rough percentage appears to be that 10pp more people see Brexit as the "risk option".
This offers an interesting left-field take on the referendum. It means that 10pp of the "Remain is safer" believers, or 5% of voters, would need to think that Brexit was "a risk worth taking", to make Brexit the likely winner. The remaining voters would be committed to Brexit or Remain either way. That 1-in-20 number strikes me as uncomfortable reading for Brexiters. Look at the general population's attitude to risk-taking on a major level. Nearly all of it is about risk-avoidance. Indeed, the huge risks that they do take are usually ones that they take unwillingly and, not infrequently, without the knowledge that they are taking the risk (see 40-year mortgages, Equitable Life, negative equity in the early 1990s). When a risk is known and perceived, and conceived to be significant, people usually plump for safety.
From that point of view, the Remainers' best argument could well be the one that they are uncomfortable to make -- that, even if being in the EU is shit, the equivalent of an abusive relationship -- even if this is the case -- we are now so inextricably tied into the EU that the risk of leaving is too great. That, no matter how bad it is, leaving would be too big a risk.
This is what I mean by "the elephant in the room". It's probably Remainers' strongest argument, but it is one that no Remainer is willing to accept exists (or, if they are, willing to campaign on it).