I haven’t posted since April since I had little to contribute to what I saw as the two overarching goals of the last six months: electing Joe Biden and developing a COVID vaccine. I did my civic duty toward the first and had nothing to contribute to the second, and so it seemed a time to pause. Now, I feel my free speech is restored and for a moment at least there is some attention to one of my favorite topics – how we need to do research for campaigns differently. I have covered much of that previously but here is a redux on the problems with polling with some updating for 2020.
1. Samples are not random. If you ever took an intro stats course, it grounded most statistics in the need for a random sample. That means that everyone in the population of interest (e.g. people who voted November 3) has an equal probability of being included in the sample.
The margin of error presumes a random sample. The number of people required to give you an accurate picture of the array of views in a population depends on size of the sample, the breadth of the views in the population, and the randomness of the sample.
The intuitive example: Imagine a bowl of minestrone soup. If you take a small spoonful, you may miss the kidney beans. The larger the spoonful (or sample) the more likely you are to taste all the ingredients. The size of the spoon is important but not the size of the bowl. But if you are tasting cream of tomato soup, you know how it tastes with a smaller spoon. America is definitely more like minestrone than cream of tomato.
The problem with polling has little to do with the margin of error, which remains unchanged. The problem is that pollsters have not used random samples for a generation. The advent of caller ID and people’s annoying proclivity to decline to answer calls from unknown numbers (a proclivity I share), plus some changes in phone technology with fiber optics – including a proliferation of numbers that are not geographically grounded, and an explosion of polls and surveys (How was your last stay at a Hilton?), makes the act of sharing your opinion pretty unspecial.
Not to worry, we pollsters said. Samples can still be representative.
2. The problem with “representative” samples. A representative sample is one constructed to meet the demographics and partisanship of the population of interest (e.g. voters in a state) in order to measure the attitudes of that representative sample.
The researcher “corrects” the data through a variety of techniques, principally stratified samples and weighting. A stratified sample separates out particular groups and samples them separately. Examples include cluster samples, which stratify by geography, and age stratified samples, which use a separate sample for young people, who are hard to reach.
Professional pollsters usually sample from “modeled” files that tell how many likely voters are in each group and their likely partisanship. They upweight – or count the people they are short of extra. They may up-weight the conservative voters without college experience, for example, to keep both demographics and partisanship in line with the model for that state or population. Virtually every poll you see has weighted the data to presumptions of demographics and partisanship.
Back to the minestrone soup example: Samples are drawn and weighted according to the recipe developed before the poll is conducted. We presume the soup has a set quantity of kidney beans because that’s what the recipe says. But voters don’t follow the recipe – they add all kinds of spices on their own. Pollsters also get in a rut on who will vote – failing to stir the soup before tasting it.
Most of the time, though, the assumptions are right. The likely voters vote and the unlikely voters do not, and partisanship reflects the modeling done the year before. But disruptive events happen. In 1998 in Minnesota, most polls (including my own) were wrong because unlikely voters participated and turnout was unexpectedly high particularly in Anoka County, home of Jesse Ventura, who became Governor that year. That phenomenon is parallel to the Trump factor in 2016 and even more so in 2020. Unexpected people voted in unexpected numbers. If the polls are right in 2022, as they generally were in 2018, it is not because the problem is fixed but because conventional wisdom is right again, which would be a relief to more than pollsters, I expect.
3. What’s next. I hope part of what’s next is a different approach to research. If campaigns and their allies break down the core questions they want to answer, they will discover that there is a far bigger and more varied toolbox of research techniques available to them. The press could also find more interesting things to write about that help elucidate attitudes rather than predict behavior.
Analytics has a great deal more to offer. That is especially so if analytics practitioners became more interested in possibilities rather than merely assigning probabilities. Analytics has become too much like polling in resting on assumptions. Practitioners have shrunk their samples and traded in classical statistics for solely Bayesian models.
Please bear with me for another few sentences on that: classical statistics make fewer assumptions; Bayesian statistics measure against assumptions. When I was in grad school (back when Jimmy Carter was President – a Democrat from Georgia!), people made fun of Bayesian models saying it was like looking for a horse, finding a donkey, and concluding it was a mule. We will never collect or analyze data the way we did in the 1970s and 80s, but some things do come around again.
It would also be helpful if institutional players were less wedded to spread sheets that lined up races by the simple probability of winning and instead helped look for the unexpected threats and opportunities. In those years when everything is as expected, there are fewer of those. But upset wins are always constructed by what is different, special, unusual, and unexpected in the context of candidates and moment. Frankly, finding those is what always interested me most because that’s where change comes from.
More on all of this in the weeks and months ahead, and more on all the less wonky things I plan to think about Democrats, the south, shifting party alignments, economic messaging, and my new home state of Mississippi. I am glad to be writing again, now that I feel more matters in this world than just Joe Biden and vaccines.
So glad to see the analytical mind from the Pearl is back in the saddle. Seems like polling averages were spot in AZ, NV, GA, NC (not in Senate), close in MI and PA and wildly off in WI, OH and IA (though Seltzer caught the change). Working class whites were mostly entrenched in the states where polling was close and the last three were more in flux. My take as a non pollster: https://greenalleystrategies.com/alley-tales/f/polling-hits-misses–and-georgias-on-my-mind
LikeLike
As always this comes from a very smart lady, which is why she was always my pollster of choice.
LikeLike