NSA's Big Data Platform Faces Enterprise TestStartup Sqrrl preps Accumulo data storage software to go commercial, teams with Hadoop provider Hortonworks to combine technologies.
It's the latest in a series of developments for NSA's big-data platform over the past 13 months. The NSA submitted Accumulo to the Apache Foundation in September 2011 for development as open source. In July 2012, a number of NSA employees joined with former White House cybersecurity strategy director Ely Khan to form Sqrrl. Two months ago, Sqrrl announced it had raised $2 million in venture funding.
Based on Google's BigTable data model, Accumulo is a distributed key/value store for structured and unstructured data. When NSA began developing Accumulo in 2008, many of the big data management platforms--Hbase, MongoDB, Cassandra, and others--were new or had not yet been released, unproven, and unlikely to meet the agency's rigorous security requirements. So NSA decided to develop the data management software it needed internally.
"By bringing data sets together, it's allowed us to see things in the data that we didn't necessarily see from looking at the data from one point or another," said Dave Hurry, head of NSA's computer science research section, in an interview with InformationWeek. Accumulo gives NSA the ability "to take data and to stretch it in new ways so that you can find out how to associate it with another piece of data and find those threats, those nuggets you were looking for," he said.
[ Read NASA Issues Big Data Challenge. ]
NSA sought security, scalability, and speed from its database. Accumulo's cell-level security makes it possible to set access control for individual pieces of data using "visibility tags." Without that capability, valuable information might remain out of reach for the analysts who need it, or time might have to be spent creating a sanitized data set. Sqrrl CEO Oren Falkowitz said he met with a large financial institution that lets only three people access a data cluster of sensitive data. With Accumulo, that kind of exclusive arrangement shouldn't be necessary.
That could prove to be a significant advantage for Accumulo. "There's nothing else out there that remotely pretends to be an alternative for a secure BigTable database," said Benson Margulies, who both worked on Accumulo as CTO for Basis Technology, an NSA contractor, and helped get the Apache Accumulo project going.
Through a feature called iterators, Accumulo supports in-database processing, which means it's able to aggregate and summarize data even as new data is added. "The way the system is architected, you're able to do a lot of 'compute on the fly,'" said Antonio Rodriguez, general partner with Matrix Partners, one of the VC firms that recently invested in Sqrrl. Atlas Venture is the other.
NoSQL databases like Accumulo let users add new data types even if they're not part of the original data model, and data attributes can be defined in a more granular way and with greater flexibility than is possible using conventional relational databases. "It's an architecture that allows people to solve the problem they want to solve as it presents itself, and not have the architecture put bounds on what they can do," Hurry said. "One of the powerful side effects is that we no longer have to spend a lot of time trying to figure out how to normalize the data."
Accumulo's other big selling point is its scalability. NSA doesn't talk about the size of its databases, but the agency's aggregated data is measured in the petabytes. NSA's largest cluster, Rodriguez said, is "much larger than anyone has ever run in Hbase." Sqrrl COO Falkowitz said the software is "ready for quadrillions of records and thousands of nodes."
Accumulo is central to NSA's big data strategy, and other U.S. defense and intelligence agencies, including the CIA, have begun to experiment with Accumulo. In March, the Apache Foundation designated Accumulo as one of its top-level project. The Accumulo development community continues to grow, and has been holding hackathons and meetups.
Sqrrl's commercial version of Accumulo is called Acorn. In addition to the database itself, the startup provides consulting on enterprise deployment and training for developers. It's also adding capabilities to Accumulo by developing its own additional capabilities and through partnerships with other vendors. In September, Sqrrl joined with MapR to bring commercial support to Accumulo and MapR's Apache Hadoop combination.
Sqrrl's target markets include financial services, health care, energy, Internet companies, and government. Falkowitz said potential customers include financial institutions and an oil company.
Accumulo is a relatively new platform, but its jumpstart in federal intelligence should work in its favor. "It's a very mature technology in terms of building a secure data store for one of the most secure customers in the world: NSA," said Chris Lynch, a partner with Atlas Venture.
At the NSA, Accumulo has moved beyond the pilot stage to become a core element of its big data strategy. The agency is using Accumulo to create a "data cloud" that makes its easier to manage, analyze, and share information. Given Accumulo's roots, the development community behind the platform is largely composed of federal agencies, including the Department of Defense. Increased diversity and commercial support will be important to the open source project's success.
Lawmakers are keeping an eye on that. The Senate's 2013 Defense authorization bill requires that Accumulo mature into "a successful open source database with adequate industry support and diversification" before being adopted within DoD (with the exception of NSA). The bill would also require that the NSA help ensure that HBase and Cassadra developers get the technical assistance needed to facilitate adoption of Accumulo's security features.
More than half of federal agencies are saving money with cloud computing, but security, compatibility, and skills present huge problems, according to our survey. Also in the Cloud Business Case issue of InformationWeek Government: President Obama's record on IT strategy is long on vision but short on results. (Free registration required.)