Big Data Has Exhaust Problem - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics

Big Data Has Exhaust Problem

Say no to data obsession and focus on the business problem you want to solve, says Berkeley data science professor.

10 Powerful Facts About Big Data
10 Powerful Facts About Big Data
(Click image for larger view and slideshow.)

Hey, what should we do with our data?

Perhaps you've heard, or asked, that question before. It's a common query these days, a byproduct of the growing interest -- some may say obsession -- with big data and data science.

Unfortunately, it's not the right question to ask, says Steve Weber, a professor at the University of California School of Information's data science program.

"A better question is: 'What do my customers really want and need and desire?' " Weber tells InformationWeek. "And then: 'What kind of data would I need to collect, and what would I need to do with it to help them?' "

Sounds like obvious stuff, Weber admits, but it's a pragmatic approach that big-data-obsessed organizations often overlook.

[For more on big data strategy, see Big Data Fans: Don't Boil The Ocean.]

"When you start with the data, it's like putting the cart before the horse," he says. "It's an obsession with the tools, an obsession with the data exhaust. You're searching around in the haystack for the needle."

It's far more efficient, he asserts, to start with a core business question. Example: What value-added service or product do I want to provide to my customers, but can't today? And then the follow-up: What data would allow me to design it?

As Weber sees it, today's rush to implement big-data platforms is reminiscent of the early days of the web, circa 1993-94.

"Everybody was racing to get a website up," Weber recalls. "And later they were figuring out what to do with it. They spent a lot of money buying technical infrastructure, because that's what you had to do. Later [they said], 'We'll figure out what to do with it,' and then had to reengineer most of that stuff."

(Source: Wikipedia.)
(Source: Wikipedia.)

Of course, businesses often function this way, particularly when new technology emerges.

"The technology comes on, and suddenly everyone feels like they've got to get it in place before they really know what they're going to do with it. You can do that, but I'm not sure it's the most efficient way to go."

And Hadoop fans, take note: It's wise not to become too enamored of a particular big-data platform or tool.

"Hadoop is great piece of software, or a great platform, but it's not the only one -- it's an early one. Lots of people are starting to build tools to democratize the ability to work with [big data]."

Again, the web analogy is applicable here. "In the early days of the web, writing HTML was really complicated. Now, basically, you don't really need to know any HTML to make a web page."

And the move to democratize complex technology may be happening much faster in the big data space. Says Weber: "Hadoop's good, but if you bet on it for the long run, you're likely to be surprised."

One big data development that Weber finds "unbelievably exciting" is the Internet of Things, or as he calls it, "really cheap sensors world."

"I'm wearing sensors, and everything I interact with is instrumented in some fashion. It starts to become mind-boggling, both in terms of what we can know, and the almost unlimited number of things we can do with that data."

That depends, of course, on whether the information is collected and analyzed, ideally by well intentioned parties.

"People sometimes use the term 'data exhaust' [to describe] all the data that their interaction with the world is throwing off, and how little of it gets collected," says Weber.

InformationWeek's new Must Reads is a compendium of our best recent coverage of the Internet of Things. Find out the way in which an aging workforce will drive progress on the Internet of Things, why the IoT isn't as scary as some folks seem to think, how connected machines will change the supply chain, and more. (Free registration required.)

Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek. View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
D. Henschen
D. Henschen,
User Rank: Author
6/25/2014 | 11:19:16 AM
Data exhaust, dark data and IOT
Not to start with the data -- going against Weber's advice -- but there's the dark data problem as well as the data exhaust problem. Dark data is the stuff that you're already collecting but not using to its fullest potential -- customer data and product data that's unused often falls into this category. Data exhaust is the stuff that you're simply throwing away either before or soon after collection, because the cost and trouble of storing it was too high by old stands. But the cost and trouble of storing data has changed with new platforms, like Hadoop and NoSQL databases, that make it easier and cheaper to work with such data.

I totally agree with Weber's start-with-the-business-need advice. But keep in mind that we're thinking up new possibilities because new platform possibilities have made it possible for everyone to dream big dreams. I also agree with Weber on IOT. I've seen a lot of examples of predictive maintenance and use of vehicle telematics at recent events including SAP Sapphire and MongoDB World.
Charlie Babcock
Charlie Babcock,
User Rank: Author
6/24/2014 | 1:41:22 PM
Big data rush, no bust in sight
I hadn't thought of the rush to big data as analogous to the rush to build Web sites before people knew what to do with them. The bust followed, but I don't think there's any similar washout coming for enthusastic use of big data. Just better align its use wtih the business, I think Steve Weber is saying.
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

IT Leadership: 10 Ways to Unleash Enterprise Innovation
Lisa Morgan, Freelance Writer,  6/8/2021
Preparing for the Upcoming Quantum Computing Revolution
John Edwards, Technology Journalist & Author,  6/3/2021
How SolarWinds Changed Cybersecurity Leadership's Priorities
Jessica Davis, Senior Editor, Enterprise Apps,  5/26/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll