The forensic data retrieval method reconstructs 99% of images stored on hard drives or media cards, and can be extended to text.

Mathew J. Schwartz, Contributor

May 13, 2010

3 Min Read

Pity the digital forensic investigator: the quantity of data stored on PCs continues to increase at a mind-boggling pace, with the gigabytes of files, images and videos growing every month. As a result, investigations -- into everything from intellectual property theft and fraud to child pornography and espionage -- can become lengthy undertakings.

New "SmartCarving" techniques, however, are helping speed up the process and retrieve more data than previous methods. For example, SmartCarving can reconstruct approximately 99 percent of digital images stored on a hard drive or media card. That's an improvement over the next-best technique, called file-carving, which retrieves about 85-90% of non-overwritten images.

The forensic issue has long been fragmentation. According to a study of 350 hard drives conducted by security researcher Simson Garfinkel, 6% of data on the average hard drive is fragmented, meaning it's intact but stored non-contiguously. Furthermore, for files with forensic importance, actual fragmentation rates were much higher: 58% for PST (Outlook) files, 17% for Word documents, and 16% for JPEGs.

Regular file-carving techniques, which look for a file header and footer and then grab everything in between, can only retrieve non-fragmented data. So for security researchers -- and especially forensic investigators -- the question has been, how do you recover the 16% of JPEGs you're missing?

"The problem was first posed to me by someone in the Department of Defense, at a dinner," said Nasir Memon, a professor of computer science at New York University. "He said, 'We know all the [image] pieces are there, but it's very difficult to find them.' Sometimes if they knew it was there, they'd try to put it back together manually, and it would take days."

Two of Memon's students solved the challenge using what's known as the shortest path problem. For non-mathematicians, think of it as what your car's GPS does when you tell it you want to drive from New York to San Francisco; it builds the best route. For images, Memon likens the reconstruction challenge to having tens of thousands of mixed-up jigsaw puzzles. By studying each image's gradients -- or smoothness -- as well as any available structural cues such as file headers, you can rebuild the image.

Now, said Memon, the technique can also be extended to text files, including Word documents and Excel spreadsheets, with about a 90% success rate. As that suggests, reconstructing text is more challenging than images, because while images have two-dimensional boundaries, text only has one. Predictive modeling, however, can help reduce false positives.

Together with the two students who codified SmartCarving, Memon co-founded a company called Digital Assembly to commercialize the resulting software. In December 2009, it released Adroit Photo Forensics, which uses SmartCarving to retrieve images.

In the future, the company could release a product that applies the same techniques to text. "We have research prototypes that work, it's just a matter of taking a couple of months to build a solid product," said Memon, if there's demand.

Calling digital forensic investigators: Do you need better text file, Word document and Excel spreadsheet retrieval tools?

About the Author(s)

Mathew J. Schwartz

Contributor

Mathew Schwartz served as the InformationWeek information security reporter from 2010 until mid-2014.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights