Problems with polling: Redux

I haven’t posted since April since I had little to contribute to what I saw as the two overarching goals of the last six months: electing Joe Biden and developing a COVID vaccine. I did my civic duty toward the first and had nothing to contribute to the second, and so it seemed a time to pause. Now, I feel my free speech is restored and for a moment at least there is some attention to one of my favorite topics – how we need to do research for campaigns differently. I have covered much of that previously but here is a redux on the problems with polling with some updating for 2020.

1. Samples are not random. If you ever took an intro stats course, it grounded most statistics in the need for a random sample. That means that everyone in the population of interest (e.g. people who voted November 3) has an equal probability of being included in the sample.

The margin of error presumes a random sample. The number of people required to give you an accurate picture of the array of views in a population depends on size of the sample, the breadth of the views in the population, and the randomness of the sample.

The intuitive example: Imagine a bowl of minestrone soup. If you take a small spoonful, you may miss the kidney beans. The larger the spoonful (or sample) the more likely you are to taste all the ingredients. The size of the spoon is important but not the size of the bowl. But if you are tasting cream of tomato soup, you know how it tastes with a smaller spoon. America is definitely more like minestrone than cream of tomato.

The problem with polling has little to do with the margin of error, which remains unchanged. The problem is that pollsters have not used random samples for a generation. The advent of caller ID and people’s annoying proclivity to decline to answer calls from unknown numbers (a proclivity I share), plus some changes in phone technology with fiber optics – including a proliferation of numbers that are not geographically grounded, and an explosion of polls and surveys (How was your last stay at a Hilton?), makes the act of sharing your opinion pretty unspecial.

Not to worry, we pollsters said. Samples can still be representative.

2. The problem with “representative” samples. A representative sample is one constructed to meet the demographics and partisanship of the population of interest (e.g. voters in a state) in order to measure the attitudes of that representative sample.

The researcher “corrects” the data through a variety of techniques, principally stratified samples and weighting. A stratified sample separates out particular groups and samples them separately. Examples include cluster samples, which stratify by geography, and age stratified samples, which use a separate sample for young people, who are hard to reach.

Professional pollsters usually sample from “modeled” files that tell how many likely voters are in each group and their likely partisanship. They upweight – or count the people they are short of extra. They may up-weight the conservative voters without college experience, for example, to keep both demographics and partisanship in line with the model for that state or population. Virtually every poll you see has weighted the data to presumptions of demographics and partisanship.

Back to the minestrone soup example: Samples are drawn and weighted according to the recipe developed before the poll is conducted. We presume the soup has a set quantity of kidney beans because that’s what the recipe says. But voters don’t follow the recipe – they add all kinds of spices on their own. Pollsters also get in a rut on who will vote – failing to stir the soup before tasting it.

Most of the time, though, the assumptions are right. The likely voters vote and the unlikely voters do not, and partisanship reflects the modeling done the year before. But disruptive events happen. In 1998 in Minnesota, most polls (including my own) were wrong because unlikely voters participated and turnout was unexpectedly high particularly in Anoka County, home of Jesse Ventura, who became Governor that year. That phenomenon is parallel to the Trump factor in 2016 and even more so in 2020. Unexpected people voted in unexpected numbers. If the polls are right in 2022, as they generally were in 2018, it is not because the problem is fixed but because conventional wisdom is right again, which would be a relief to more than pollsters, I expect.

3. What’s next. I hope part of what’s next is a different approach to research. If campaigns and their allies break down the core questions they want to answer, they will discover that there is a far bigger and more varied toolbox of research techniques available to them. The press could also find more interesting things to write about that help elucidate attitudes rather than predict behavior.

Analytics has a great deal more to offer. That is especially so if analytics practitioners became more interested in possibilities rather than merely assigning probabilities. Analytics has become too much like polling in resting on assumptions. Practitioners have shrunk their samples and traded in classical statistics for solely Bayesian models.

Please bear with me for another few sentences on that: classical statistics make fewer assumptions; Bayesian statistics measure against assumptions. When I was in grad school (back when Jimmy Carter was President – a Democrat from Georgia!), people made fun of Bayesian models saying it was like looking for a horse, finding a donkey, and concluding it was a mule. We will never collect or analyze data the way we did in the 1970s and 80s, but some things do come around again.

It would also be helpful if institutional players were less wedded to spread sheets that lined up races by the simple probability of winning and instead helped look for the unexpected threats and opportunities. In those years when everything is as expected, there are fewer of those. But upset wins are always constructed by what is different, special, unusual, and unexpected in the context of candidates and moment. Frankly, finding those is what always interested me most because that’s where change comes from.

More on all of this in the weeks and months ahead, and more on all the less wonky things I plan to think about Democrats, the south, shifting party alignments, economic messaging, and my new home state of Mississippi. I am glad to be writing again, now that I feel more matters in this world than just Joe Biden and vaccines.

People Do Not Want To Be Polled

The core problem with polling is that people do not wish to be polled.  Those who answer their phones when the caller is unknown to them are unusual and atypical.  And even many who do answer do not choose to complete the poll.    

This year’s telephone polling results were closer to the final election results than in 2016.  Much of the improvement, however, was in the nature of the mid-term electorate and not because the polls themselves were better.  The mid-term electorate was highly polarized, and rabid partisans are easier to poll than voters in the middle.  Polls were still wrong when those in the middle did not break proportionately to the partisans. 

Back in the 1980s, polling achieved representative samples of voters by calling phone numbers at random.  The definition of random is that everyone in the universe of interest (people who will vote in the next election) has an equal chance of being polled. With the advent of cell phones, caller ID, and over-polling, samples have not been random for a while – not since the last century anyway. 

Pollsters replaced random samples with representative ones. Political parties and commercial enterprises have “modeled” files – for every name on the voter file, there is information on the likely age, gender, race or ethnicity and, using statistics, the chances that individual will vote as a Democrat or Republican.  If the sample matches the distribution of these measures on the file, then it is representative and the poll should be correct.

There are three problems (at least) with that methodology:  (1) there may be demographics the pollster is not balancing that are important;  pollsters got the 2016 election wrong in part because they included too few voters without college experience in samples and college and non-college voters were more different politically than they had been before.  (2)  rather than letting the research determine the demographics of the electorate, the pollster needs to make assumptions about who will turn out to make the sample representative – including how many Democrats and how many Republicans.  When those assumptions are wrong so are the polls.  This year, conventional wisdom was correct and so the polls looked better.

The third problem is perhaps the most difficult and follows from the first two:  pollsters “weight” the data to their assumptions.  If there are not enough voters under 30 in the sample (and they are harder to reach) then pollsters count the under 30 voters they did reach extra – up weighting the number of interviews with young people to what they “should” have been according to assumptions.  Often, however, the sample of one group or the other wasn’t only too small, but was an inadequate representation in the first place – a skewed sample of young people is still skewed when you pretend it is bigger than it actually was. 

The problems can be minimized by making more calls to reduce the need to up-weight the data.  If 30 percent of some groups of voters complete interviews but only 10 percent of other groups, just make three times the number of calls to the hard to reach group.  That is what my firm and others did this year.  It is, however, an expensive proposition and still does not insure that the people who completed interviews are representative of those who did not.

Next Post:  The Self-Selecting Internet