Big Data Has Exhaust Problem

Machine Learning & AI

Say no to data obsession and focus on the business problem you want to solve, says Berkeley data science professor.

Jeff Bertolucci, Contributor

June 24, 2014

3 Min Read

(Source: <a href="http://en.wikipedia.org/wiki/Exhaust_gas" target="_blank">Wikipedia</a>.)

10 Powerful Facts About Big Data

10 Powerful Facts About Big Data (Click image for larger view and slideshow.)

Hey, what should we do with our data?

Perhaps you've heard, or asked, that question before. It's a common query these days, a byproduct of the growing interest -- some may say obsession -- with big data and data science.

Unfortunately, it's not the right question to ask, says Steve Weber, a professor at the University of California School of Information's data science program.

"A better question is: 'What do my customers really want and need and desire?' " Weber tells InformationWeek. "And then: 'What kind of data would I need to collect, and what would I need to do with it to help them?' "

Sounds like obvious stuff, Weber admits, but it's a pragmatic approach that big-data-obsessed organizations often overlook.

[For more on big data strategy, see Big Data Fans: Don't Boil The Ocean.]

"When you start with the data, it's like putting the cart before the horse," he says. "It's an obsession with the tools, an obsession with the data exhaust. You're searching around in the haystack for the needle."

It's far more efficient, he asserts, to start with a core business question. Example: What value-added service or product do I want to provide to my customers, but can't today? And then the follow-up: What data would allow me to design it?

As Weber sees it, today's rush to implement big-data platforms is reminiscent of the early days of the web, circa 1993-94.

"Everybody was racing to get a website up," Weber recalls. "And later they were figuring out what to do with it. They spent a lot of money buying technical infrastructure, because that's what you had to do. Later [they said], 'We'll figure out what to do with it,' and then had to reengineer most of that stuff."

Of course, businesses often function this way, particularly when new technology emerges.

"The technology comes on, and suddenly everyone feels like they've got to get it in place before they really know what they're going to do with it. You can do that, but I'm not sure it's the most efficient way to go."

And Hadoop fans, take note: It's wise not to become too enamored of a particular big-data platform or tool.

"Hadoop is great piece of software, or a great platform, but it's not the only one -- it's an early one. Lots of people are starting to build tools to democratize the ability to work with [big data]."

Again, the web analogy is applicable here. "In the early days of the web, writing HTML was really complicated. Now, basically, you don't really need to know any HTML to make a web page."

And the move to democratize complex technology may be happening much faster in the big data space. Says Weber: "Hadoop's good, but if you bet on it for the long run, you're likely to be surprised."

One big data development that Weber finds "unbelievably exciting" is the Internet of Things, or as he calls it, "really cheap sensors world."

"I'm wearing sensors, and everything I interact with is instrumented in some fashion. It starts to become mind-boggling, both in terms of what we can know, and the almost unlimited number of things we can do with that data."

That depends, of course, on whether the information is collected and analyzed, ideally by well intentioned parties.

"People sometimes use the term 'data exhaust' [to describe] all the data that their interaction with the world is throwing off, and how little of it gets collected," says Weber.

InformationWeek's new Must Reads is a compendium of our best recent coverage of the Internet of Things. Find out the way in which an aging workforce will drive progress on the Internet of Things, why the IoT isn't as scary as some folks seem to think, how connected machines will change the supply chain, and more. (Free registration required.)

About the Author(s)

Jeff Bertolucci

Contributor

Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek.

See more from Jeff Bertolucci

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

About the Author(s)

Editor's Choice