Living with Uncertainty

Given the problems with polls, do they accurately predict elections?  The answer is they usually do.  Their assumptions need to be more or less correct and their samples inclusive, which is harder than it has been in the past.  Nobel Prize winning physicist Richard Feynman said, “It is scientific only to say what’s more likely or less likely, and not be proving all the time what’s possible or impossible.”  

Some studies have concluded polls are no less accurate now than they have ever been, but that should not be too reassuring.  Polls have always made assumptions about who is going to vote, and wrong assumptions have long led to wrong predictions – like on how Dewey would beat Truman in 1948.  Samples were off then because people without phones supported Truman.  Samples were off in 2016 when polls included too many voter with college experience and made wrong assumptions on voter turnout. 

We never really know the outcome of an election before it happens.  At best, we know what outcome is more likely (and sometimes much more likely).  Most of us do not really have a reason to know the outcome of the election before it happens.  If we work for a partisan committee like the NRSC or DSCC, we may be concerned with allocating resources.  If we work for the media, we may feel polls are newsworthy (although I wish y’all found them less so is it really news that someone is more likely to win than someone else?). 

Polling somewhat ironically shows that voters do not trust the polls they hear about in the media – and those who took that poll may arguably have trusted polls more than the average voter or why bother.  Media coverage of bad polling can create electoral outcomes, a concern raised by a bipartisan group of pollsters ( back in 2010.

Modeling can help with prediction because it develops a predictive algorithm or formula that does not require a random or representative sample.  The process does require an adequate sample, however, and often more analysis and examination of error than is applied (which is the subject of a subsequent blog).  And modeling still just provides a probability and not an absolute.   Two plus two may always equal four but neither polling nor modeling are arithmetic; they say there is some level of probability that Candidate X will win, or that voter Y will support him or her.

The margin of error does not help.  It describes the statistical chance the poll is wrong by more than that number of points assuming the sample is truly random, which is rarely the case, or representative, which is increasingly arguable.

More skepticism about polls is healthy.  It reduces the risk of cutting off resources from a campaign that can win, or affecting electoral outcomes through publicizing wrong polling.  As for campaign strategy, we could use some new thinking about how we listen to voters that might make campaigns more interesting and engaging to more people, even while their outcome remains uncertain.  It is sometimes the job of campaign strategists to make what seems impossible, in fact not only possible but real, and polling alone does not do that.

Next Post:  Could AI have written this blog?

The Self-Selecting Internet

The most obvious solution to the problems with telephone polling is to administer polls online.  That is a solution to the burgeoning cost of polling but not to the problem of whether the sample is representative of the electorate.  Internet polling is less expensive and many companies provide polling panels that can mirror population demographics.  But there is no way around the reality that people who are less interested in politics are disinclined to complete polls about politics even if they are interested enough to vote.  

Internet respondents for the most part (there are exceptions) are people who have signed up to be on a panel and take a lot of polls.  Should we assume that those who subscribe to a panel in exchange for a reward of some kind are representative of those who do not?  I do not think so – especially when the invitation to complete the poll often tells you what it is about.  (As a panelist, I chose to complete recent polls on feminism and on the Supreme Court but not on several other topics).    

One group that is often underrepresented in both telephone and online polls are people in the middle of the political spectrum.  In 2018, most voters were knew early on which party they would support, particularly in federal races (at least according to the polls). The election depended on voter turnout patterns and the relatively small number of people in the middle who were undecided, conflicted, not yet paying attention, more disinterested, or considering split tickets.

Voters in the middle are less likely than rabid partisans to want to share their political views whether probed online or on the phone.  If you are tired of arguments about President Trump from either perspective, you are less likely to agree to spend 10 or 15 minutes talking (or writing) about him.  Internet poll results often have even fewer undecided voters than telephone polls.           

Luckily for the pollsters, the middle was a small group in this year’s election and so the absence of people in the middle did not skew too many polls.  Some polls were wrong in Ohio because voters in the middle were disproportionately likely to support Republican Mike DeWine for Governor and Democrat Sherrod Brown for U.S. Senate. Those who were careful to poll the middle correctly predicted the result in each race.  Those who polled more partisans and fewer voters in the middle got it wrong. 

Online polling is also prone to leave out another significant group:  people who are not online.  Telephone polling tells us that 80-85 percent of voters are online but there is still 15 to 20 percent who say they are not.  Combining online and telephone samples can fill that gap – except they will both leave out those people who simply – for whatever reason – do not want to be polled.

Next Post:  Living with Uncertainty 

People Do Not Want To Be Polled

The core problem with polling is that people do not wish to be polled.  Those who answer their phones when the caller is unknown to them are unusual and atypical.  And even many who do answer do not choose to complete the poll.    

This year’s telephone polling results were closer to the final election results than in 2016.  Much of the improvement, however, was in the nature of the mid-term electorate and not because the polls themselves were better.  The mid-term electorate was highly polarized, and rabid partisans are easier to poll than voters in the middle.  Polls were still wrong when those in the middle did not break proportionately to the partisans. 

Back in the 1980s, polling achieved representative samples of voters by calling phone numbers at random.  The definition of random is that everyone in the universe of interest (people who will vote in the next election) has an equal chance of being polled. With the advent of cell phones, caller ID, and over-polling, samples have not been random for a while – not since the last century anyway. 

Pollsters replaced random samples with representative ones. Political parties and commercial enterprises have “modeled” files – for every name on the voter file, there is information on the likely age, gender, race or ethnicity and, using statistics, the chances that individual will vote as a Democrat or Republican.  If the sample matches the distribution of these measures on the file, then it is representative and the poll should be correct.

There are three problems (at least) with that methodology:  (1) there may be demographics the pollster is not balancing that are important;  pollsters got the 2016 election wrong in part because they included too few voters without college experience in samples and college and non-college voters were more different politically than they had been before.  (2)  rather than letting the research determine the demographics of the electorate, the pollster needs to make assumptions about who will turn out to make the sample representative – including how many Democrats and how many Republicans.  When those assumptions are wrong so are the polls.  This year, conventional wisdom was correct and so the polls looked better.

The third problem is perhaps the most difficult and follows from the first two:  pollsters “weight” the data to their assumptions.  If there are not enough voters under 30 in the sample (and they are harder to reach) then pollsters count the under 30 voters they did reach extra – up weighting the number of interviews with young people to what they “should” have been according to assumptions.  Often, however, the sample of one group or the other wasn’t only too small, but was an inadequate representation in the first place – a skewed sample of young people is still skewed when you pretend it is bigger than it actually was. 

The problems can be minimized by making more calls to reduce the need to up-weight the data.  If 30 percent of some groups of voters complete interviews but only 10 percent of other groups, just make three times the number of calls to the hard to reach group.  That is what my firm and others did this year.  It is, however, an expensive proposition and still does not insure that the people who completed interviews are representative of those who did not.

Next Post:  The Self-Selecting Internet