Government // Big Data Analytics
Commentary
12/10/2013
10:01 AM
Joel Westphal
Joel Westphal
Commentary
Connect Directly
RSS
E-Mail
100%
0%
Repost This

The War On Military Records

The era of big data has arrived on the battlefield and we need to find new ways to deal with it.

This same paradigm shift has also caused the vast expansion of records and material to preserve. Just 10 years ago, we were dealing with gigabytes of data. Now we are trying to manage terabytes and petabytes. In essence, the era of big data had arrived on the battlefield and we needed to find new ways to deal with it.

The volume of records that were successfully preserved by US Central Command in the second Gulf War amounted to more than 54 terabytes. That's more than 40 million files and documents! However, only a percentage of these records -- between 10 and 15 terabytes are actually deemed to be of a permanent nature by the National Archives.

That introduces another question: How does one separate the wheat from the chaff of a collection of records that if converted to paper would reach 270,000 feet high? And do that with only a staff of three?

The answer can be found using technology available from firms such as Active Navigation Inc. I first saw the potential of Active Navigation's File Analysis capabilities at the ARMA 2010 trade show and was impressed by the way File Analysis could extract value from content for which we had no previous knowledge, a form of blind discovery.

The tool allowed us to quickly get grasp an overall view of our information estate, quickly identify redundant, obsolete and trivial content (also known as ROT), and remove duplicate records. Reducing the total estate is crucial, and the way forward, because the alternative of keeping it all is simply no longer a viable option, given government IT budgets and the increasing costs of long term storage and back-up infrastructure.

For years, we in the records business have heard of such tools being called a game changer. To be honest, I was not a believer. I have since changed my mind. I'm now convinced, if appropriately combined with other records tools, these products can help government agencies, businesses, universities, and private companies save millions of dollars in un-needed storage, back-up infrastructure and eDiscovery costs.

These tools are valuable in other ways. Active Navigation, for instance, has the ability to identify privacy data (such as Social Security numbers) and ultimately augment the content with relevant metadata, extracted from the document itself, using natural language processing, making document identification more accurate and complete prior to handover to the National Archives and Records Administration.

The modern "War on Records" has just begun. Big data has presented a monumental challenge for today's Records and Information Managers. To combat this problem, new innovative systems are often required. When I was presented with the challenges of the OIF collection I did not look for twentieth century solutions for a twenty-first century problem. I truly believe that if technology is the problem, better technology is often the solution to it.

Today's records and information management technology has improved to such a point that even organizations with minimal staffing can deal with the big data problems persistent in both the private and public business spheres. Our nation's historical record depends on it!

Joel Westphal is the Agency Records Officer for the Office of Personnel Management in Washington, D.C. He also serves on the National Archives Federal Records Council. He previously worked at US Central Command, as the Chief, Records Management Section.

IT groups need data analytics software that's visual and accessible. Vendors are getting the message. Also in the State Of Analytics issue of InformationWeek: SAP CEO envisions a younger, greener, cloudier company. (Free registration required.)

Previous
2 of 2
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
<<   <   Page 2 / 2
Joel5171
50%
50%
Joel5171,
User Rank: Apprentice
12/10/2013 | 3:14:51 PM
Re: Terabyte Limit Doesn't Compute
I cannot speak for rules and regulations from the National Archives. There are permanent records and temporary records. NARA usually does not want temporary records.

However, I think for the next conflict (and lets hope that day never comes) records issue there should be some discussion on keeping it all. But understand this opens up some serious issues for FOIA and eDiscovery.

Also, the 54 terabytes does not include the records from the Army, Navy and Air Force who collect their own service records. It also does not include other classified records.
Laurianne
50%
50%
Laurianne,
User Rank: Author
12/10/2013 | 2:55:11 PM
Re: Terabyte Limit Doesn't Compute
Doug makes a good point. Also, how did plentiful cloud storage options factor into the decision process?
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
12/10/2013 | 2:30:05 PM
Terabyte Limit Doesn't Compute
In the context of big data, 54 terabytes is no big deal. I don't get why the National Archives has cut off what they will save at 14 terabytes. Why not save it. You never know when some future revelation or future form of analysis renders that information relevant and useful. Geez, you can find 2 TB and 3 TB external hard drives for as little as $100. Why do we have to make any hard choices here. Save everything! History won't forgive you for repeting the same mistakes you described after the first Gulf War.
Joel5171
50%
50%
Joel5171,
User Rank: Apprentice
12/10/2013 | 12:15:20 PM
Re: Preserving Public Records
This is a great question! Given that the article had to be of a certain length a lot of what I wanted to say could not be included. This was one issue that I could not fit it.

A File Analysis tool is really just one answer to a question that requires multiple answers. For example, you still need an Electronic Records Management Application (ERMA) to ensure records are kept for as long as their disposition requires.

These tools help identify (and quickly) what and where the information is so you can then apply retention on them. They quickly assist the archivist or records manager in cleaning up those cluttered shared drives and SharePoint repositories that are out there.

I still hope to publish a more expansive article in either the American Archivist or Archivaria which really details how the entire operation to recover, preserve and then identify the records took place.
WKash
100%
0%
WKash,
User Rank: Author
12/10/2013 | 10:37:04 AM
Preserving Public Records
Thanks for sharing your experiences -- and the dilemma facing government records archivists.  Given the immense volumes of records generating daily, it is impossible to comprehend how individuals could sort through and manage all these records anymore.  My question is, are these content classification tools potentially breeding a false sense of security that they are identifying all the records that, by statute, must be preserved?

 
<<   <   Page 2 / 2
Skirting the Big Data Expertise Shortage
Skirting the Big Data Expertise Shortage
Federal departments and agencies have embraced big data in a big way, despite a shortage of trained and experienced workers, particularly data scientists. What tools and strategies are helping bridge the divide?
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Elite 100 - 2014
Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators.
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.