Software // Information Management
News
2/11/2014
09:06 AM
Connect Directly
RSS
E-Mail
100%
0%

Is Your Company Running A Data Dump?

Hoarding useless data makes analytics harder. Companies like Paxata say their brand of analytics lets non-data experts turn data landfills into useful info.

Companies of all sorts are now in the garbage business. Without even thinking about it, companies collect so much data that they have data garbage dumps, filled up with bad data.

The big difference between data dumps and real landfills is the smell; bad data doesn't have the same odor. That's probably why companies keep collecting data they don't need. It's also cheap to keep data, and it's gotten cheaper in the last few years. That just makes comparing data harder to do.

"There's so much data from different places and in different formats. It's very difficult to treat that data," says Jon Oltsik, an analyst at Enterprise Strategy Group in Milford, Mass.

[What does "real time" mean, anyway? Read Real-Time Analytics: Ready For Its Close-Up?

The rise of post-relational database tools such as Hadoop, Mongo DB, and Cassandra have lowered data storage costs, says Nenshad D. Bardoliwalla, cofounder and vice president of product at Paxata, a startup that uses machine learning and analytics to automate and accelerate the data preparation part of big data. No longer do companies need to think about what they're storing.

"Companies have flipped their mentality to just store it all, rather than just the data they really want," he says.

Bardoliwalla was at Hyperion in an earlier era of data warehousing, and others involved in founding Paxata were at SAP, Tibco, and Guidewire.

Paxata's founders think they've used analytics to help turn big data landfills into compost. They argue the problem companies face is in preparing data, which is time consuming and costly. Bardoliwalla says that data preparation either takes place through arduous hand coding, with specialists using tools like Informatica and Trillium, or trying to scrub data in Excel.

Image courtesy of St. Louis County.
Image courtesy of St. Louis County.

Paxata applies analytic techniques to data sources to see whether Michael Fitzgerald, Mike Fitzgerald, and M Fitzgerald in different databases might all be the same person, for instance. Its software figures the answer out on its own, meaning a user does not have to look at it. For very large data sets, that promises huge time savings.

"The value there is exactly as they say," Oltsik said. He has no ties to Paxata and has not looked at its product.

Paxata's target user is someone like the company's vice president of marketing, an experienced user of Excel, but not a "super jock." She needs information from disparate sources, and needs to know things such as whether a sales lead is a duplicate, and if information about it is correct. Providing that context to data sets is one of the things that costs analysts precious time.

The rule of thumb is that data preparation takes up 80% to 90% of the time people spend on data, leaving a small fraction of time for actual analysis. "People pour things into the data landfill. They don't even know it's there," he says. "There's a huge discoverability problem that needs intelligent algorithmic techniques and visualization techniques to allow computers to do the heavy lifting."

Bardoliwalla wants to flip the ratio of time that analysts spend on data, so they can spend 80% of their time analyzing data sets. There is value in data, but getting to the value might be more expensive than the data is worth, like ore buried too deeply in a mine.

Paxata says it has about a dozen customers including data storage firm Box, Dannon, the American unit of French yogurt maker Group Danone, and the big Swiss financial firm UBS. It also is not alone in the market: just today I received an email for a pre-briefing on a similar product from another data company.

Perhaps some day soon companies will spend their time making hay from their data.

You can use distributed databases without putting your company's crown jewels at risk. Here's how. Also in the Data Scatter issue of InformationWeek: A wild-card team member with a different skill set can help provide an outside perspective that might turn big data into business innovation. (Free registration required.)

Michael Fitzgerald writes about the power of ideas and the people who bring them to bear on business, technology and culture. View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
anon4483228301
50%
50%
anon4483228301,
User Rank: Apprentice
2/18/2014 | 12:36:33 PM
Re: Data dump by another name
right,

good 

<a  href="http://www.fmed.bu.edu.eg">information</a>
jagibbons
50%
50%
jagibbons,
User Rank: Ninja
2/12/2014 | 8:10:46 AM
Re: E-discovery
There's big money in e-discovery. At least for the lawyers and vendor making e-discovery tools. The enterprise being targeted is the loser because of time, effort and cost to dig through all of that stuff.

I suspect that many companies are keeping too much data because they don't have a strategy well enough defined to outline a use or purpose for that data. Then, since storage is relatively cheap, they keep everything.

The flip side of that coin are the companies run by executives who've been bitten before by lawsuits and keep everything for CYA purposes.

Either way, it's to the company's detriment to keep absolutely everything. Decide what you need and keep that. Much better and more effective than keeping everything and eventually (or not) deciding what you need.
Lorna Garey
50%
50%
Lorna Garey,
User Rank: Author
2/11/2014 | 1:45:54 PM
E-discovery
We hear all the time about companies spending millions on e-discovery requests and lawyers coming up with a 'smoking gun' from some obscure data source that no one thought to delete. To wit: Chris Christie as Jersey digs for Bridgegate evidence.

Do you think the 'keep everything forever' mindset is going to play into this, making money for e-discovery software firms and consultancies and teams of lawyers?
Michael Fitzgerald
50%
50%
Michael Fitzgerald,
User Rank: Moderator
2/11/2014 | 1:17:59 PM
Re: Data dump by another name
I had the same thought about creating a smell for bad data.  We all know data decays over time. It would be fun, and telling, to have data records take on a different hue as they aged, perhaps.  You could then apply a little data air freshener. Or put it in a data coffin...
RobPreston
100%
0%
RobPreston,
User Rank: Author
2/11/2014 | 10:45:59 AM
Re: Data dump by another name
"Data lake" doesn't do the practice justice. Lakefront property fetches a premium. No one's looking to drain lakes (for the most part) or reduce their size. For those subjected to driving through Staten Island, think Arthur Kill. Local residents couldn't close up that dump fast enough. Perhaps if rotting data smelled (a perverse market opportunity here?), companies wouldn't hoard so much of it.  
Laurianne
50%
50%
Laurianne,
User Rank: Author
2/11/2014 | 10:27:54 AM
Data dump by another name
EMC likes to use the term "data lake" to describe the vast amount of data customers are grappling with. That sounds more pleasant -- but at some companies, data dump must certainly be more accurate.
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - August 27, 2014
Who wins in cloud price wars? Short answer: not IT. Enterprises don't want bare-bones IaaS. Providers must focus on support, not undercutting rivals.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.