A big data stockpile may contain dark data -- unstructured, unclassified information that you can't put to good use. Maybe it's time to find it.
Quick, how much of your big data is dark?
Sure, the word "dark" is open to interpretation, so let's clarify things a bit. Gartner's IT Glossary offers this definition of dark data: Information that an organization collects, processes, and stores in its day-to-day operations, but which it largely fails to use for other purposes, including analytics or business relationships.
"Similar to dark matter in physics, dark data often comprises most organizations' universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value," Gartner stated.
But even if you know what dark data is, managing it can be tricky, said Julie Colgan, director of information governance solutions for Nuix, an enterprise software company that helps organizations manage growing volumes of unidentified, unstructured data tucked away in archives, email and collaboration systems, hard drives, and other places.
Nuix's customers include government, law enforcement, and regulatory agencies. Organizations also use the company's software for e-discovery to proactively govern their information and to seek out potential legal threats and opportunities.
"Dark data is the data that an organization retains, often unknowingly, that lacks any substantive control or classification," Colgan told InformationWeek in a phone interview.
As a result, organizations often are unable to benefit from it.
"Data is dark when we don't know it exists, when we can't find it, when we can't interpret it, and when we can't share or interface with it," said Colgan.
But how does data join the dark side?
"Sometimes data goes dark because we're simply too busy to deal with it, so we push it to the side and ignore it," Colgan said. "Maybe we don't have the right tools to address the scale or speed, or to shine a light on the data."
Alternatively, data can go dark when it's trapped in a repository -- a legacy archive, for instance -- that renders it difficult to access or analyze.
"We have a lot of customers interested in migrating off legacy archives," said Colgan. "They're doing so for a couple of reasons: One, a number of archives are at end of life, and (customers) want to go to a more modern platform; two, they want to migrate to the cloud."
As is often the case with big data implementations, companies may find themselves with information hoards that are needlessly large. Knowing which data to keep can prove challenging.
"They find they have more information than they need, and they want to ... make some good decisions about what to keep, how to keep it, and how to get rid of the stuff they don't need," said Colgan.
She offered this advice for companies dealing with dark data:
"Take a step back and think strategically about how information is an asset, and (how it) presents new and different kinds of risks to your organization," said Colgan. "Align that to what your risk tolerance is ... and then apply the right tools."
The goal should be to create an environment where data "isn't a constant tsunami that's drowning everyone," she added. "The old methods for managing information need to be examined and realigned."
Of course, this process includes making good decisions about "what data to keep, how to keep it, and how to get rid of the stuff you don't need," said Colgan.
Data protection perceptions seem unconnected from reality for the 437 respondents to our 2014 Backup Technologies Survey, as 36% say they're very satisfied with their backup systems even as just 23% are extremely confident in their recovery capabilities. Get the 2014 Backup Technologies Survey report today. (Free registration required.)
Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek. View Full Bio
6 Tools to Protect Big DataMost IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Big Data Brings Big Security ProblemsWhy should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.
Join us for a roundup of the top stories on InformationWeek.com for the week of December 14, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program.