As data collection becomes faster and easier, a growing number of enterprises are facing a new challenge: identifying and eliminating junk data.
Junk data is any kind of data that no longer serves a purpose. “This can include data that's outdated, incomplete, inaccurate, or duplicative,” explains Kathy Rudy, chief data and analytics officer for technology research and advisory firm ISG.
Perhaps the most common type of junk data is data sprawl -- files that are simply no longer useful. “This includes documents, spreadsheets, executable files, temporary files, images -- the data you would find on or in hard drives, data centers, and hand-held devices are the data assets created by corporations every day,” Rudy says.
Junk data accumulates when systems and people don't properly focus on ensuring data as an asset that aligns with and helps deliver business goals, says Tyler Warden, senior vice president of product and engineering at data and migration management services Syniti. “In other words, people have day jobs, and the data suffers,” he notes.
Short of major events, like a data migration project or an enterprise merger, junk data tends to multiply gradually over time, Warden observed. “It's typically cheaper to tackle early, before it has a chance to accumulate,” he says.
Enterprises that fail to take the steps needed to curb the creation and retention of junk data risk are finding themselves coping with overloaded, unmanageable, and unreliable data resources. “Every second we are connected to an electronic device, we are creating data,” Rudy notes. “Every email you write, presentation you prepare, entry you make into a corporate system, search on the Internet ... creates data, and not all of the data is useful or used.”
Without careful oversight and management, data can quickly overwhelm both storage resources and users. “Thank goodness we don't have data junkyards, or they would overrun the Earth!” Rudy quips.
Data can also start out wrong from the very beginning, such as when mistakes are made during manual data entry. Data may also become junk if it takes too long to arrive at its destination, such as when a public health care authority that depends on up-to-the-minute statistics on disease incidence inadvertently accesses obsolete data. “If we can’t make use of data or, even more seriously, if we shouldn't use it ... it’s junk data,” says Andrea Malick, analyst and research director in the data and analytics practice at Info-Tech Research Group.
While it's widely acknowledged that junk data is a dead weight that should be eradicated as soon as it's discovered, actually jettisoning useless files can be both tricky and risky. “Even when we know there's junk data, there's real fear about removing something important,” Malick explains. Finding volunteers willing to send any type of enterprise data into oblivion can be difficult. “Everyone's watching the junk pile, everyone wants someone to do something, but they don't have a designated ‘someone’ or clear guidelines to act on it,” she says. As a result, the junk data remains and continues growing.
An Existential Threat
Despite the hurdles, junk data must be uncovered and eliminated, since it hinders business efficiency and productivity while potentially leading to false and potentially business-fatal assumptions and decisions. “It makes processes run longer, leads to poor decision making, costs the company real money, and can introduce unnecessary compliance risk to the business,” Warden says.
Keeping junk data around can also create serious compliance and legal headaches. Some privacy regulations, for instance, require organizations to limit the amount of data they collect, Malick says. Security is another serious concern. “Without understanding or controlling what's there, you may not be safeguarding potentially sensitive data against unauthorized access,” she warns. Another important reason for dumping useless files is avoiding excessive on-site and/or cloud storage cost.
Every enterprise needs a comprehensive and documented data retention policy covering all types and categories of data being stored. Department managers also need to be periodically reminded to follow and adhere to the policy's guidelines.
An excellent way to get rid of obvious junk data in a hurry is to use an automated cleanup tool that regularly scans for and deletes temporary and redundant files without harming any integral system files or relevant enterprise data. “This method is convenient because the automated tool looks for and roots out all the junk files without the user expending any significant effort,” says Eric McGee, senior network engineer at TRGDatacenters. “Moreover, such tools undertake regular scans, sending you alerts of junk files that need to be deleted.”