Hoarding useless data makes analytics harder. Companies like Paxata say their brand of analytics lets non-data experts turn data landfills into useful info.

Michael Fitzgerald, Contributor

February 11, 2014

3 Min Read
<em><b>Image courtesy of <a href="http://www.stlouiscountymn.gov/" target="blank">St. Louis County</a>.</em></b>

Companies of all sorts are now in the garbage business. Without even thinking about it, companies collect so much data that they have data garbage dumps, filled up with bad data.

The big difference between data dumps and real landfills is the smell; bad data doesn't have the same odor. That's probably why companies keep collecting data they don't need. It's also cheap to keep data, and it's gotten cheaper in the last few years. That just makes comparing data harder to do.

"There's so much data from different places and in different formats. It's very difficult to treat that data," says Jon Oltsik, an analyst at Enterprise Strategy Group in Milford, Mass.

[What does "real time" mean, anyway? Read Real-Time Analytics: Ready For Its Close-Up?

The rise of post-relational database tools such as Hadoop, Mongo DB, and Cassandra have lowered data storage costs, says Nenshad D. Bardoliwalla, cofounder and vice president of product at Paxata, a startup that uses machine learning and analytics to automate and accelerate the data preparation part of big data. No longer do companies need to think about what they're storing.

"Companies have flipped their mentality to just store it all, rather than just the data they really want," he says.

Bardoliwalla was at Hyperion in an earlier era of data warehousing, and others involved in founding Paxata were at SAP, Tibco, and Guidewire.

Paxata's founders think they've used analytics to help turn big data landfills into compost. They argue the problem companies face is in preparing data, which is time consuming and costly. Bardoliwalla says that data preparation either takes place through arduous hand coding, with specialists using tools like Informatica and Trillium, or trying to scrub data in Excel.

Paxata applies analytic techniques to data sources to see whether Michael Fitzgerald, Mike Fitzgerald, and M Fitzgerald in different databases might all be the same person, for instance. Its software figures the answer out on its own, meaning a user does not have to look at it. For very large data sets, that promises huge time savings.

"The value there is exactly as they say," Oltsik said. He has no ties to Paxata and has not looked at its product.

Paxata's target user is someone like the company's vice president of marketing, an experienced user of Excel, but not a "super jock." She needs information from disparate sources, and needs to know things such as whether a sales lead is a duplicate, and if information about it is correct. Providing that context to data sets is one of the things that costs analysts precious time.

The rule of thumb is that data preparation takes up 80% to 90% of the time people spend on data, leaving a small fraction of time for actual analysis. "People pour things into the data landfill. They don't even know it's there," he says. "There's a huge discoverability problem that needs intelligent algorithmic techniques and visualization techniques to allow computers to do the heavy lifting."

Bardoliwalla wants to flip the ratio of time that analysts spend on data, so they can spend 80% of their time analyzing data sets. There is value in data, but getting to the value might be more expensive than the data is worth, like ore buried too deeply in a mine.

Paxata says it has about a dozen customers including data storage firm Box, Dannon, the American unit of French yogurt maker Group Danone, and the big Swiss financial firm UBS. It also is not alone in the market: just today I received an email for a pre-briefing on a similar product from another data company.

Perhaps some day soon companies will spend their time making hay from their data.

You can use distributed databases without putting your company's crown jewels at risk. Here's how. Also in the Data Scatter issue of InformationWeek: A wild-card team member with a different skill set can help provide an outside perspective that might turn big data into business innovation. (Free registration required.)

About the Author(s)

Michael Fitzgerald

Contributor

Michael Fitzgerald writes about the power of ideas and the people who bring them to bear on business, technology and culture.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights