Big Data // Big Data Analytics
News
8/11/2014
10:19 AM
Connect Directly
Google+
RSS
E-Mail
50%
50%

When Data Joins The Dark Side

A big data stockpile may contain dark data -- unstructured, unclassified information that you can't put to good use. Maybe it's time to find it.

Quick, how much of your big data is dark?

Sure, the word "dark" is open to interpretation, so let's clarify things a bit. Gartner's IT Glossary offers this definition of dark data: Information that an organization collects, processes, and stores in its day-to-day operations, but which it largely fails to use for other purposes, including analytics or business relationships.

"Similar to dark matter in physics, dark data often comprises most organizations' universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value," Gartner stated.

But even if you know what dark data is, managing it can be tricky, said Julie Colgan, director of information governance solutions for Nuix, an enterprise software company that helps organizations manage growing volumes of unidentified, unstructured data tucked away in archives, email and collaboration systems, hard drives, and other places.

[Not getting what you expect from your analytics initiative? See 8 Reasons Big Data Projects Fail.]

Nuix's customers include government, law enforcement, and regulatory agencies. Organizations also use the company's software for e-discovery to proactively govern their information and to seek out potential legal threats and opportunities.

"Dark data is the data that an organization retains, often unknowingly, that lacks any substantive control or classification," Colgan told InformationWeek in a phone interview.

As a result, organizations often are unable to benefit from it.

"Data is dark when we don't know it exists, when we can't find it, when we can't interpret it, and when we can't share or interface with it," said Colgan.

(Source: NASA)
(Source: NASA)

But how does data join the dark side?

"Sometimes data goes dark because we're simply too busy to deal with it, so we push it to the side and ignore it," Colgan said. "Maybe we don't have the right tools to address the scale or speed, or to shine a light on the data."

Alternatively, data can go dark when it's trapped in a repository -- a legacy archive, for instance -- that renders it difficult to access or analyze.

"We have a lot of customers interested in migrating off legacy archives," said Colgan. "They're doing so for a couple of reasons: One, a number of archives are at end of life, and (customers) want to go to a more modern platform; two, they want to migrate to the cloud."

As is often the case with big data implementations, companies may find themselves with information hoards that are needlessly large. Knowing which data to keep can prove challenging.

"They find they have more information than they need, and they want to ... make some good decisions about what to keep, how to keep it, and how to get rid of the stuff they don't need," said Colgan.

She offered this advice for companies dealing with dark data:

"Take a step back and think strategically about how information is an asset, and (how it) presents new and different kinds of risks to your organization," said Colgan. "Align that to what your risk tolerance is ... and then apply the right tools."

The goal should be to create an environment where data "isn't a constant tsunami that's drowning everyone," she added. "The old methods for managing information need to be examined and realigned."

Of course, this process includes making good decisions about "what data to keep, how to keep it, and how to get rid of the stuff you don't need," said Colgan.

Data protection perceptions seem unconnected from reality for the 437 respondents to our 2014 Backup Technologies Survey, as 36% say they're very satisfied with their backup systems even as just 23% are extremely confident in their recovery capabilities. Get the 2014 Backup Technologies Survey report today. (Free registration required.)

Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek. View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Oldest First  |  Newest First  |  Threaded View
Laurianne
50%
50%
Laurianne,
User Rank: Author
8/11/2014 | 11:21:22 AM
E-discovery
E-discovery (with an eye to legal protection)  has been an issue for two decades. Are cloud storage services making it any easier to manage?
Alison_Diana
50%
50%
Alison_Diana,
User Rank: Author
8/11/2014 | 11:52:14 AM
Ending the Silos
Many organizations are dark, as you describe it, because of the silos you mention. Now recognizing the value, costs, and legal protections consolidation create, many organizations are slowly but surely pulling together their data repositories. It's challenging, but the payoffs -- as those who have accomplished the task often can attest to -- are many and rich.

On the consumer side, I'm sure we can all recall instances where our data is housed multiple times within a business. Often, that results in multiple emails/calls/letters, sometimes using different information. Multiply that across millions of people and that saving alone adds up. On the legal front, not knowing what you have (and, therefore, being unable to correctly secure it at times) is a hazard for many industries.
David F. Carr
50%
50%
David F. Carr,
User Rank: Author
8/11/2014 | 5:15:20 PM
Information hoards that are "needlessly large"
Isn't the point of big data technology that it's possible to hoard data more greedily and tease useful information ot of it? Maybe you want to root out duplication or reduce the amount of data that adds liaibility without any compliance-oriented justification for retaining it. But if there is some potential value left in the information, don't you want to be a hoarder these days?
MDMConsult14
50%
50%
MDMConsult14,
User Rank: Moderator
8/12/2014 | 1:55:02 AM
Re: E-discovery
The challenges are obvious. Dark data does raise eDiscovery cost where the organization if in litiffation, reviewing the case can only increase eDiscovery costs. It also consumers resources in IT a great deal. This can be time consuming and stressful for IT personnel given they may have to restore or identify files which are hard to locate.
mdelince
50%
50%
mdelince,
User Rank: Apprentice
8/12/2014 | 11:13:06 AM
what about common sense
If data is collected and not used, shouldn't the first reaction be to stop collecting it?

This seems to go against the Big Data "goal" of collecting everything and trying to find something (or possibly anything). But it may be a lot more costly to keep data for which there is little to no value considering this dark data may get stolen stolen (since it is dark data, would you even know the data was stolen?) resulting in potential legal fines and loss of trust (i.e. loss of customers, investors, partners).
pfretty
100%
0%
pfretty,
User Rank: Ninja
8/15/2014 | 11:04:21 AM
Silos and lack of strategy
As another commenter posted, silos are a serious contributing factor to organizations accumulating dark data.  However as a recent IDG SAS survey showed, the surprisingly hack lack of a data strategy plays a key role here as well. Without a serious understanding of what you want to get out of data - and an understanding of how to do it, data will fail to fully realize its potential. 

 

Peter Fretty
eamonwalsh80
50%
50%
eamonwalsh80,
User Rank: Strategist
8/23/2014 | 2:25:08 AM
Re: Silos and lack of strategy
So true. Using data without a long term vision for where it can be used, metrics and analysis which can be incurred and a proper data strategy just means one thing - plenty of blind spots and chaos. The price for acquiring such data which isn't even stale , but dead/dark data is often paid by IT data strategy personnel (though the cost is billed on the company). You need a tool like HAVEN to make more sense of it (goo.gl/HFdxfV)
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Dec. 9, 2014
Apps will make or break the tablet as a work device, but don't shortchange critical factors related to hardware, security, peripherals, and integration.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of December 14, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.