Keep Your Credibility Intact, Don’t Cite Bad Data

August 20, 2020 Author: Roy Eduardo Kokoyachuk

Nothing helps bolster an argument more than citing a research study that proves your point with statistics. A quick Google search can pull up numerous results of supporting data to prove just about anything. Even flat earthers can “prove” their theories with “research” they find online. The explosion of DIY survey tools has made it possible for anyone with a keyboard to create a “poll” and disseminate the results. The challenge is data integrity, which is often sorely lacking here. To help cut through the clutter of bad research and avoid destroying your credibility by citing it, here are some guidelines to follow when making your assessments.  

Representative Sample

The single most common problem we see with research studies is that the individuals who responded to the survey are not representative of the intended target. Finding representative samples of individuals to take a survey has gotten much harder over the past 30 years. Gone are the days where landline surveys reached over 90% of the U.S. population. That number is now closer to 40%, and those phones rarely get answered by a live person.

DIY platforms are often free and supply respondents for a fee. Creating representative sample frames, however, is something polling and research companies heavily invest in. And it is something that constantly needs refining. Professional polling and market research companies use a mix of landlines, mobile phones, online market research panels and in-person intercept surveys followed by proprietary weighting formulas to create accurate results.

Surveys targeting populations with large numbers of non-English speakers require extra scrutiny to ensure adequate numbers of foreign language responses were collected. Also, don’t assume that common identifiers like “Adults” always mean the same thing.  Depending on the survey, it can mean 18+, 21+, 18-49, 18-64 or something else.

Poor Survey Design

Poorly designed surveys come in two flavors - intentionally and unintentionally skewed. Intentionally skewed surveys are often fielded by individuals and organizations with an agenda to promote. Always find out who underwrote a survey before citing it. Unintentionally skewed surveys are a bit harder to spot. Personal, often unconscious, biases creep their way into survey questions with regularity. Even well-designed questions can yield strange results based on priming from previous questions or the aforementioned sampling issues.

The most common questionnaire flaws include double-barreled questions (having more than one correct answer), telescoping (including past behaviors in the measurement period), social desirability bias (yes, I do floss twice a day!), primacy effects (first and last options are most remembered), inaccurate scales (too few or too many intervals or lack of a “don’t know” option) and many, many others.

While it is not often possible to see the questionnaire when considering the published results of a study, it is often possible to see the results of poor questionnaire design in the results. Findings from well-designed surveys are internally coherent and hold up when cross-checked against other sources.

Data Scraping and Passive Listening

During the past decade, there was an explosion of passive tracking and scraping of social media posts to measure consumer sentiment. Much of that type of research is now banned due to Europe’s General Data Protection Regulation (GDPR) and California’s California Consumer Privacy Act (CCPA). That doesn’t mean, however, that passive tracking has stopped. There are more consent hoops to jump through, but consumers will often freely give up their privacy to have access to all of a website or social platform’s features.

Passively gathered data, however, can be misleading. A classic example of passively gathered data that turned out to be incorrect predates the internet. Soviet satellites flew over the Pentagon every day at noon. Right in the middle of the complex was a small building with lots of activity in the photos. The Soviets assumed it must be the entrance into a bunker or top-secret meeting room. It turned out to be a hot dog stand. It was lunchtime after all. Just because something appears to be correlated, doesn’t mean it is. 

Google, Facebook, Snap and Twitter all do primary research to determine what is motivating people to type their queries into search or click on certain content. They have millions of terabytes of passive tracking data but asking consumers directly why they did what they did is worth a mountain of “likes.”


The internet has enabled us to look up any bit of information we can think of quickly. Whether the results we get back are accurate is something else altogether. Taking some extra steps before citing a study can prevent embarrassment or, if the decision which the study is informing is critical, much worse.