Big Data // Big Data Analytics
News
6/24/2014
09:06 AM
Connect Directly
Google+
RSS
E-Mail
50%
50%

Big Data Has Exhaust Problem

Say no to data obsession and focus on the business problem you want to solve, says Berkeley data science professor.

10 Powerful Facts About Big Data
10 Powerful Facts About Big Data
(Click image for larger view and slideshow.)

Hey, what should we do with our data?

Perhaps you've heard, or asked, that question before. It's a common query these days, a byproduct of the growing interest -- some may say obsession -- with big data and data science.

Unfortunately, it's not the right question to ask, says Steve Weber, a professor at the University of California School of Information's data science program.

"A better question is: 'What do my customers really want and need and desire?' " Weber tells InformationWeek. "And then: 'What kind of data would I need to collect, and what would I need to do with it to help them?' "

Sounds like obvious stuff, Weber admits, but it's a pragmatic approach that big-data-obsessed organizations often overlook.

[For more on big data strategy, see Big Data Fans: Don't Boil The Ocean.]

"When you start with the data, it's like putting the cart before the horse," he says. "It's an obsession with the tools, an obsession with the data exhaust. You're searching around in the haystack for the needle."

It's far more efficient, he asserts, to start with a core business question. Example: What value-added service or product do I want to provide to my customers, but can't today? And then the follow-up: What data would allow me to design it?

As Weber sees it, today's rush to implement big-data platforms is reminiscent of the early days of the web, circa 1993-94.

"Everybody was racing to get a website up," Weber recalls. "And later they were figuring out what to do with it. They spent a lot of money buying technical infrastructure, because that's what you had to do. Later [they said], 'We'll figure out what to do with it,' and then had to reengineer most of that stuff."

(Source: Wikipedia.)
(Source: Wikipedia.)

Of course, businesses often function this way, particularly when new technology emerges.

"The technology comes on, and suddenly everyone feels like they've got to get it in place before they really know what they're going to do with it. You can do that, but I'm not sure it's the most efficient way to go."

And Hadoop fans, take note: It's wise not to become too enamored of a particular big-data platform or tool.

"Hadoop is great piece of software, or a great platform, but it's not the only one -- it's an early one. Lots of people are starting to build tools to democratize the ability to work with [big data]."

Again, the web analogy is applicable here. "In the early days of the web, writing HTML was really complicated. Now, basically, you don't really need to know any HTML to make a web page."

And the move to democratize complex technology may be happening much faster in the big data space. Says Weber: "Hadoop's good, but if you bet on it for the long run, you're likely to be surprised."

One big data development that Weber finds "unbelievably exciting" is the Internet of Things, or as he calls it, "really cheap sensors world."

"I'm wearing sensors, and everything I interact with is instrumented in some fashion. It starts to become mind-boggling, both in terms of what we can know, and the almost unlimited number of things we can do with that data."

That depends, of course, on whether the information is collected and analyzed, ideally by well intentioned parties.

"People sometimes use the term 'data exhaust' [to describe] all the data that their interaction with the world is throwing off, and how little of it gets collected," says Weber.

InformationWeek's new Must Reads is a compendium of our best recent coverage of the Internet of Things. Find out the way in which an aging workforce will drive progress on the Internet of Things, why the IoT isn't as scary as some folks seem to think, how connected machines will change the supply chain, and more. (Free registration required.)

Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek. View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
6/25/2014 | 11:19:16 AM
Data exhaust, dark data and IOT
Not to start with the data -- going against Weber's advice -- but there's the dark data problem as well as the data exhaust problem. Dark data is the stuff that you're already collecting but not using to its fullest potential -- customer data and product data that's unused often falls into this category. Data exhaust is the stuff that you're simply throwing away either before or soon after collection, because the cost and trouble of storing it was too high by old stands. But the cost and trouble of storing data has changed with new platforms, like Hadoop and NoSQL databases, that make it easier and cheaper to work with such data.

I totally agree with Weber's start-with-the-business-need advice. But keep in mind that we're thinking up new possibilities because new platform possibilities have made it possible for everyone to dream big dreams. I also agree with Weber on IOT. I've seen a lot of examples of predictive maintenance and use of vehicle telematics at recent events including SAP Sapphire and MongoDB World.
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
6/24/2014 | 1:41:22 PM
Big data rush, no bust in sight
I hadn't thought of the rush to big data as analogous to the rush to build Web sites before people knew what to do with them. The dot.com bust followed, but I don't think there's any similar washout coming for enthusastic use of big data. Just better align its use wtih the business, I think Steve Weber is saying.
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - August 27, 2014
Who wins in cloud price wars? Short answer: not IT. Enterprises don't want bare-bones IaaS. Providers must focus on support, not undercutting rivals.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Howard Marks talks about steps to take in choosing the right cloud storage solutions for your IT problems
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.