A Strategy to Protect Unstructured Data 2

You've got data everywhere. We've got a plan to help you find and control it.

Adam Ely, COO, Bluebox

September 17, 2010

6 Min Read
InformationWeek logo in a gray background | InformationWeek

IT organizations are well aware that sensitive information resides in corporate databases, but unstructured data--e-mail, Office documents, and other content types--can be just as valuable and need protection. The challenge for IT is that unstructured data is growing at a breakneck pace--a compound annual growth rate of 61%, according to IDC, almost three times the growth rate of structured data. It's also scattered throughout the enterprise: in folders on file servers, on laptops, and tucked inside USB drives. You need a strategy for securing it.

Start by understanding the types of content in your company, and the value it has to the business. If your company handles credit cards, then you automatically think of PCI. Your nightmare is credit card numbers sitting on a file server for anyone to find. If you're in the medical field, HIPAA and patient records are a top concern. Other important data types are customer and employee personal information, intellectual property, and operational data.

These groupings are broad but give you enough to build on. The main idea is to understand the types of data and how you will respond once each type is discovered. Once you compile a basic list, work with representatives from IT, legal, compliance, HR, finance, and business development. They will identify data you've forgotten or didn't know about.

Next, map your data types to a classification and handling policy that outlines how groups of data should be managed. The most common mistake we see when IT groups write these policies is specifying exactly how data should be protected. That approach is inefficient and causes more work for you later. Instead, provide a range of acceptable measures rather than mandates. For example, if your company prefers that data in transit be encrypted using SSLv2, but it also will accept the use of TLS 2.0, put both options in your policy. This makes the policy much more flexible for those implementing the protection. That's critical, because if they can't work with you, they'll work around you.

One last note on data classification policies: They often fail because all documents are tagged as confidential, devaluing the policy. Your classification system should differentiate between valuable information that carries a high level of risk and other information that may be sensitive but carries less risk if exposed or lost.

Searching For Unstructured Data

The next step is finding the data. This can be tricky. You know where it should be stored, but because information is so portable, it has a habit of turning up in unexpected places.

Using your list of data types as a reference, begin searching file shares, laptops, connected storage devices--anywhere you can. You should also involve users. Ask them where they store data, and have them review documents they own to identify sensitive data that needs to be protected or organized. This step can ease some of the burden on the IT department. The only sticking point is getting people to actually do it. This process must be reinforced through user awareness of what constitutes sensitive and risky data, what to do with it, and whom to ask when in doubt.

If your company has the budget, investigate data loss prevention (DLP) products, which search for sensitive data and can help prevent the data from leaving the enterprise. If you're financially constrained, there's a relatively new open source offering, appropriately named OpenDLP.

By the way, a data classification and discovery initiative is a great time to consolidate storage locations, archive or purge old documents, and generally tidy up. The fewer documents and storage locations, the easier it will be to apply and maintain controls. You may also save money on storage if you uncover--and delete--caches of duplicate data. It's also an appropriate time to revisit the company's retention policy to determine if it's too stringent.

Apply Appropriate Controls

Rather than search piles of unstructured data for sensitive content, you might be tempted to simply apply strong security controls to all enterprise data. One common, albeit draconian, method is to slap strict access controls on all data stores and ban the use of USB drives and other portable media.

Good for security? Sort of. Good for business? No. Overly broad controls complicate the lives of the people who need to access and share data--that is, pretty much every employee. It also complicates your own life because you'll end up applying (and managing) controls around a good deal of unimportant information, such as an employee's MP3 files and last year's corporate holiday schedule. Instead, take a measured approach. Start with highly valuable or sensitive data and revisit the rest after you've dealt with your critical information.

You have a variety of security controls at your disposal, such as access controls, passwords, and encryption. For instance, if you find sensitive data on a file server, apply root directory access controls. Archives or spreadsheets stored in areas that can't be secured, such as on a user's desktop or on a network drive in preparation for a presentation, should be password-protected.

When possible, encrypt highly sensitive data. Products such as PGP and the open source alternative GPG provide a standard approach to file-level encryption. WinZip, which allows for AES-256 encryption, is an inexpensive product. Consider volume or full-disk encryption for laptops and other mobile devices, especially if users store many highly sensitive documents on their systems.

However, guard against encryption overkill. Most employees aren't walking around with thousands of customer credit card numbers on their laptops, so encrypting entire drives just because you can isn't worth the investment.

DLP is also a control option. In addition to searching for sensitive data, DLP products monitor network traffic for improper or unauthorized transmissions. DLP systems can also be implemented in passive mode to understand how data moves in your company, so that you can create your own rules or modify the canned policies that come with the product.

Note that DLP isn't a panacea. DLP products handle credit card and Social Security numbers out of the box, but more granular tuning of these systems--to reduce the number of false positives, for instance--can take time. We know of a recently installed DLP system that sent an alert each time a user logged into Facebook, because the session ID was similar to a credit card number. DLP products can also be expensive.

Data Protection: Rinse And Repeat

When implementing controls, you're bound to run into problems. Unstructured content is very different from data stored in databases. It doesn't have a single home you can protect and audit. It travels outside the company. It's copied and modified. It grows rapidly. The answer is to ensure that processes and applications can scale. For instance, scan data stores for the highest-value data first, and then rescan for lower-value data.

Remember to regularly review data types, storage locations, and the risks associated with known data. As business processes and goals evolve, some data types become more valuable, some less valuable. Storage locations will also change over time, and your processes must account for those changes.

Protecting unstructured data is hard. To succeed, place controls close to the data and work outward, but be mindful of the impact of those controls on data owners and users. Communicate to end users what is and isn't acceptable; education is vital when implementing controls that move or alter data or stop actions, such as copying or e-mailing files.

Finally, make sure that data owners understand that no control is 100% effective, and efforts to secure unstructured data are just one facet of a larger layered security approach, which requires their help and support.

Adam Ely is director of security for TiVo and an InformationWeek Analytics contributor. Write to us at [email protected].

About the Author

Adam Ely

COO, Bluebox

Adam Ely is the founder and COO of Bluebox. Prior to this role, Adam was the CISO of the Heroku business unit at Salesforce where he was responsible for application security, security operations, compliance, and external security relations. Prior to Salesforce, Adam led security and compliance at TiVo and held various security leadership roles within The Walt Disney Company where he was responsible for security operations and application security of Walt Disney web properties including ABC.com, ESPN.com, and Disney.com.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights