Machine Learning & AI

The Trouble with Data About Data

It's time to accept the fact that so much of the data we see is biased, whether intentionally OR not. so what can you do about it?

Lisa Morgan, Freelance Writer

September 21, 2016

3 Min Read

Two people looking at the same analytical result can come to different conclusions. The same goes for the collection of data and its presentation. A couple of experiences underscore how the data about data -- even from authoritative sources -- may not be as accurate as the people working on the project or the audience believe. You guessed it: Bias can turn a well-meaning, "objective" exercise into a subjective one. In my experience, the most nefarious thing about bias is the lack of awareness or acknowledgement of it.

The Trouble with Research

I can't speak for all types of research, but I'm very familiar with what happens in the high-tech industry. Some of it involves considerable primary and secondary research, and some of it involves one or the other.

Let's say we're doing research about analytics. The scope of our research will include a massive survey of a target audience (because higher numbers seem to indicate statistical significance). The target respondents will be a subset of subscribers to a mailing list or individuals chosen from multiple databases based on pre-defined criteria. Our errors here most likely will include sampling bias (a non-random sample) and selection bias (aka cherry-picking).

The survey respondents will receive a set of questions that someone has to define and structure. That someone may have a personal agenda (confirmation bias), may be privy to an employer's agenda (funding bias), and/or may choose a subset of the original questions (potentially selection bias).

The survey will be supplemented with interviews of analytics professionals who represent the audience we survey, demographically speaking. However, they will have certain unique attributes -- a high profile or they work for a high-profile company (selection bias). We likely won't be able to use all of what a person says so we'll omit some stuff -- selection bias and confirmation bias combined.

We'll also do some secondary research that bolsters our position -- selection bias and confirmation bias, again.

Then, we'll combine the results of the survey, the interviews, and the secondary research. Not all of it will be usable because it's too voluminous, irrelevant, or contradicts our position. Rather than stating any of that as part of the research, we'll just omit those pieces -- selection bias and confirmation bias again. We can also structure the data visualizations in the report so they underscore our points (and misrepresent the data).

We Need to Improve, Desperately

Bias is not something that happens to other people. It happens to everyone because it is natural, whether consciously or unconsciously. Rather than dismiss it, it's prudent to acknowledge the tendency and attempt to identify what types of bias may be involved, why, and rectify them, if possible.

I recently worked on a project for which I did some interviews. Before I began, someone in power said, "This point is [this] and I doubt anyone will say different." Really? I couldn’t believe my ears. Personally, I find assumptions to be a bad thing because unlike hypotheses, there's no room for disproof or differing opinions.

Meanwhile, I received a research report. One takeaway was that vendors are failing to deliver "what end customers want most." The accompanying infographic shows, on average, that 15.5% of end customers want what 59% of vendors don't provide. The information raised more questions than it answered on several levels, at least for me, and I know I won't get access to the raw data.

My overarching point is that bias is rampant and burying our heads in the sand only makes matters worse. Ethically speaking, I think as an industry, we need to do more.

About the Author

Lisa Morgan

Freelance Writer

Lisa Morgan is a freelance writer who covers business and IT strategy and emerging technology for InformationWeek. She has contributed articles, reports, and other types of content to many technology, business, and mainstream publications and sites including tech pubs, The Washington Post and The Economist Intelligence Unit. Frequent areas of coverage include AI, analytics, cloud, cybersecurity, mobility, software development, and emerging cultural issues affecting the C-suite.

See more from Lisa Morgan

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

The Trouble with Data About Data

About the Author

Editor's Choice

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

<span class="ArticleBase-LargeTitle">The Trouble with Data About Data</span>The Trouble with Data About Data

About the Author

Editor's Choice

The Trouble with Data About Data