Nearly four years later, the results of the 2016 U.S. presidential election are still shocking. Polls showed Hillary Clinton with a significant lead over Donald Trump, almost guaranteeing the win and appointment of America’s first female president. Victory parties were planned. Fist bumps and high fives were going around. But the polls were wrong. Across the pond, polls got it wrong again in the UK with the Brexit referendum. It seemed that those tasked with gauging public sentiment couldn’t seem to find it’s pulse. However, the “USC/L.A. Times Daybreak Tracking Poll” got it right. This was one of a few Polls that predicted Trump had a lead over Hillary.
As a market researcher, and, dare I say, data collections expert, I have a few thoughts on why the polls were so wrong and how the L.A. Times got it right and what it means for future polls. In short, most polls surveyed people that were not truthful about their voting intentions. It’s also difficult to predict enthusiasm and voter turnout. But the small data sample sizes commonly used and the lack of data weighting are most alarming, and that is where I’ll focus. Let’s take a closer look at both.
How can a poll of only 1,004 Americans represent 260 million people with a 3% margin of error? The answer boils down to representation. Should more interviews have been conducted to get a better representation of votes, thus more precise predictions? Perhaps, but considering the methodology – phone surveys – factors of time and costs are at play. Despite online sampling technology, many pollsters still use traditional survey methods via phone.
The demise of landlines is shrinking the pool of potential phone respondents. Cell phones are not the best predictor of where people live. For example, someone may move to Nevada from California, but keeps his or her cell phone number because it’s just easier, right? Pollsters surveying in California call the cell phone number only to discover that the number's owner now lives in Nevada. Cell phones pose a geographic challenge and have limited sample pools.
In a recent poll, phone interviews with a sample size of a little more than a 1,000 people were conducted to predict the current front runner of the Democratic 2020 primaries. Déjà vu, 2016. For the past 25 years, the internet has been the biggest source of data, yet, all the network polls that I’ve seen have most of their data collection conducted offline. One firm that understood the power of massive online data collections is Cambridge Analytica. They were at the center of controversy for using Facebook’s data to gather insights for the Trump campaign. How did they collect so much data? Cambridge Analytica created a quiz (survey) that was exposed to about 87 million Facebook users via a Loophole API. Whatever your opinion might be about Cambridge Analytica, they were able to gather millions of data points from people by using the internet. This gave them the tools to generate better insights predictions than phone polling. My point is that sample size matters. This is the perfect segue into my earlier example of the L.A. Times and their ability to properly weigh data to produce a more reliable Poll. Again, they did so well that they were one of the only polls to predict Trump’s win in 2016 accurately.
Data weighting is used to adjust the results of a study to bring it more in line with what is known of a population. For example, if the Census says that 10% of Americans are African American, then a Poll will survey 100 African Americans and use their response to represent 10% of African Americans (roughly 30 million) in the country. The same goes for Hispanics, Non-Hispanic Whites/Caucasians, and so on. That is what I call “macro-weighing.” On a more granular level, polls that survey ten low-income African Americans in the South will not represent higher-income African Americans in the South, and vice versa. The results will vary based on income and education. Similarly, for US Hispanics, the data must be weighted on acculturation level, age, income, and education. Acculturated Hispanics with 100k+ income won’t represent lower-income acculturated Hispanics or unacculturated middle-class Hispanics.
But data weighting does not just apply to ethnic groups. The same goes for Non-Hispanic Whites/Caucasians. For instance, in 2016, there were several voters in Pennsylvania, Michigan, and Wisconsin who came out to the polls for the first time which skewed results of polls not appropriately weighted. L.A. Times was the only company to weight the data differently. Opting to micro weight the data enabled them to discover that a higher number of minorities planned to vote for Trump and locate the first-time voters. Researchers at the publication were heavily criticized for their strategies, but they got it right. For those who doubt the methodology, L.A. Times also predicted Obama as a winner in 2008 and 2012. Choosing to micro weight the data gave them a more diverse data set and the ability to survey triple the number of people than the rest of the polls who got it wrong.
In summary, polls that got the 2016 election right did three things: used the biggest database, collected a much bigger and diverse sample size, and micro weighted the data. The polls that got it wrong did none of these things.
Polling does not have to be controversial, but it should be thorough. Data collections can also be done by reputable sample companies that understand how to use data technology to gather bigger sample sizes. As I mentioned, the internet is the largest source of sample, and if done correctly and with integrity, controversy can be avoided. Online sample companies can use the power of programmatic sampling, online panels, and proper AI technology to conduct political polls that provide consumers with accurate data and analysis with the insights they need to inform their campaign strategy.