I have tweeted that many of the public polls are “flawed and meaningless,” which was a bit hyperbolic on my part. It was born of frustration with the high number and lower quality of polls and their own tendency toward hyperbolic analysis and over-prediction. So here are some more considered thoughts, using more than 280 characters.
There is no such thing as a perfect poll. If there were, it would include only people who are going to vote in the election of interest, and the sample would match their demographics precisely, both overall and within subgroups. That won’t happen because there is no way of knowing exactly who will vote – voters don’t yet know if they will – and while the pollster can work at getting the demographics right, there is always the chance that they are not, or that samples are affected by response bias.
That said, not all polls are created equal. Some are conducted responsibly and analyzed thoroughly, with the pollster applying their own skeptical and analytic oversight in reporting the results. Other polls are “quick and dirty” with less caution in sampling and an analysis that seems to stop at the top lines.
Here are some things to look for to separate the better ones from those that are fundamentally flawed.
The Sample. No one knows exactly who is going to vote. Voters can tell you whether they currently think they will but they are not very accurate about that. Campaign pollsters usually use a model based on vote history, which means they are pretty sure they are talking to likely voters but may also exclude some people who are new to the electorate. Most public polls use a random sample and self-report, which is more inclusive but often more inclusive than reality.
Only about 58 percent of eligible voters participated in the 2016 general election and 28 percent in the primaries (across both parties). Democratic turnout may be higher in 2020 but when a poll includes all adults and reports that 47 percent of those who are registered are likely to vote in the Democratic primary, something is likely wrong. (And not all states require advance registration so that screen alone introduces a small bias.)
Consumers of polling should make a judgment about how good a job the pollster did finding the electorate of interest before considering the results.
Demographics. We know the demographics of who has voted in the past (using voter file analysis) and chances are who will vote in the future is roughly similar, although how similar is a matter of conjecture. Roughly half of Democratic primary voters are of color, and they are more female and have more formal education than average. In the past, Democratic primary voters have been older than average although millennial participation increased in 2018 and may in the 2020 primaries (https://www.brookings.edu/research/the-2018-primaries-project-the-demographics-of-primary-voters/). A poll of primary voters that matches overall census demographics is very likely wrong. A poll that doesn’t consider the potential for turnout shifts is over-confident.
Data Weighting. These days nearly all polls weight by demographics because different groups of voters have different probabilities of responding to polls (although there are stratification procedures in sampling that minimize the need for weighting, few public polls use them). A procedure called “raking” weights the data to expected demographics. How finely honed those demographic goals are can impact the sample in unexpected ways. An initial sample that is low on African Americans and on young voters can double-weight young, African Americans, leading to wrong conclusions about both young people and African Americans.
Few polls report how much weighting they did or how, which means analysis of subgroups within the electorate can be dicey. Because younger voters are less likely to complete polls, I generally assume they have been weighted and am more cautious about results by age. (My former firm used to stratify by age to minimize weighting – but that is an expensive process.)
Days in the Field. One way to get a better demographic distribution with less weighting is to stay in field longer and try each prospective respondent multiple times. Be wary of polls that fielded only a day or two, especially if all the calling was on a weekend. Chances are the data are weighted extra-heavily because it takes longer to get a more representative sample.
Decidedness. Campaign polls usually ask something like, “Are you certain you will support that candidate or do you think you might change your mind?” When you interrupt someone’s evening to ask whom they support for President, they may give you an answer they believe is a firm commitment, or they may just pick the candidate they know best or have heard about most recently. Some measure of certainty is useful. It is rare to see most supporters certain of their choice until the closing weeks.
Additionally, Mark Blumenthal presented work at the AAPOR conference showing that news events can create polling response bias. If someone has been in the news recently, their supporters may be more willing to complete your poll.
Relevance. By the time I vote on March 10, 2020, the field of candidates will be different – many of the current 23 will likely have suspended their campaigns and there could be new entries still. The current preferences of voters in later states is not, I would submit, terribly relevant to the process that will winnow candidates before they get to the later states.
The dialogue of the race may also change. Voters’ focus on perceived electability may shift with perceptions of Trump’s fortunes or simply as voters know the candidates and the differences among them better.
In any case, national polls of primary voters have odd samples as the rules are different state by state. They also are imposing simultaneous responses on a sequential process.
Prediction. Polls don’t predict. It is a cliché – but still true – that they are “snapshots in time.” The “horse race” alone is the more-or-less casual preference of someone who may or may not have given the matter much thought, given that they will not act on their preference for at least eight months.
Analysis of other data allows some cautious hypotheses – candidates who are less known have more room for growth; candidates whom voters actively oppose surely have less.
A Pop Quiz. Given all of this, which of the following statements about the Democratic primary election is most likely true?
- Joe Biden is the front runner.
- Joe Biden has almost 40 percent of the vote.
- Polls consistently show Biden with more early support than other candidates.
- We know nothing at all about any of this yet.
I would submit that #3 is true – Biden has more early support than others – but what that will mean for eight months from now – or a year from now – is a matter of conjecture. Voters like Joe Biden but others have more room for growth. Polls, however, should not create self-fulfilling prophecies or false narratives.
I am going to find the election very interesting to watch. I really don’t have firm predictions on how it will develop. To me, that’s what makes it interesting.