Big data systems are invading enterprise data centers at a rapid rate, but they often lack the controlled access, data encryption, and other protections inherent in relational systems, according to a SANS Institute survey of 206 companies. Of the respondents, 43% were from organizations with 10,000 or more employees and 53% held a title related to IT security.
Big data systems increasingly serve as the repository for personal-identification information and corporate intellectual property. For example, the SANS survey found 73% of respondents with big data applications "use them to store personal data on customers and 72% store important business data," such as employee records (64%), intellectual property (59%), and payment card information (53%).
The result is an exposure that companies may not have counted on as they initiated their pilot big data projects, according to the survey report, "Enabling Big Data By Removing Security and Compliance Barriers," available here (registration required). Cloudera, the supplier of Hadoop system Cloudera Enterprise, sponsored the SANS survey.
Many times, those projects demonstrate the utility of bringing together diverse data that was previously hard to assemble given the radically different data types. Big data systems gain utility as more data is brought in. The result is a slow brew of gathering risk without sufficient safeguards, the study warns.
[Hortonworks is adding encryption to its big data system. Learn more: Hortonworks Deploys Hadoop Into Public Clouds.]
The SANS Institute is a private company that provides training and certification in cyber-security skills. Its name springs from its initial target group of IT professionals: system administrators and audit, networking, and security managers. The results of the survey were reported by SANS analyst Barbara Filkins, with John Pescatore, SANS director of emerging technologies, acting as an adviser.
Cloudera claims a marketplace lead with its built-in security measures, according to Alex Gutow, a Cloudera product marketing manager. For example, Cloudera is PCI compliant in handling credit card information. Other Hadoop systems have yet to achieve the rating, she said in an interview at the Hadoop Summit in San Jose, Calif., Wednesday.
MasterCard, a Cloudera partner and customer, has been using a PCI-certified enterprise data hub since 2014, said Sam Heywood, director of Cloudera's Security Center of Excellence in Austin, Texas.
But other Hadoop-based systems are bent on catching up. Hortonworks, in an announcement before the summit, said it has added the protection of encryption for data at rest as well as data in transit to its 2.3 release of the Hortonworks Data Platform. Most big data system suppliers will look at the SANS survey and redouble their efforts to protect data in their systems.
Among the survey respondents, 27.4% said they are running a big data production system; 10.4% are running a pilot system; 17.4% were engaged in proof of concept; 28.4% had plans for a system but had not implemented it, usually due to resource issues; and 4.5% had no plans for a big data system. The remaining respondents did not know if such a system was in the works.
At the same time, 83% of the SANS survey respondents who had a running system said their systems "must comply with one or more regulatory standards." In 40% of these cases, compliance must be established by external audit.
The stakes are large for all of the system suppliers. The market for big data products, such as Hortonworks, Riak, Couchbase, MongoDB, and Cloudera, is expected to grow from $16.55 billion in 2014 to $41.52 billion by 2018, according to market researcher IDC.
Security was one of the topics addressed by a panel of big data users Thursday, the last day of the Hadoop Summit. Anil Varma, VP of data and analytics for Schlumberger, said imposing user access controls, based on identity and roles, is one way to improve big data security. In order for role restrictions to work, companies will have to practice good data governance. Data must be tagged and segmented as it's gathered, with personal-identification information having a much higher role restriction than anonymous, click-stream data.
"The next two to three years will be really important on that (data governance)," he said. Due to worries over security, "a lot of this data still hasn't been brought in," he noted.
David Lin, Symantec cloud platform engineer, said his firm needed to protect its data before it could extend services that help customers protect theirs. He urged companies to build up their big data lakes, initially in a restricted fashion, and then figure out how to grant more access to them.
"There's a lot of uncertainty around security. Kill the fear. Haters to the left. Get started and go. Smart people will figure it out," he said.
Sam Gentsch, manager of IT at Home Depot, said his firm is imposing a user-access-control framework with fine-grained controls on its Hadoop big data system.