The operating system on which any database runs should be hardened and locked down. Most NoSQL technologies leverage Linux, so there are a variety of options to choose from. When hardening an OS, focus on four areas: users, permissions, services, and logging. Mechanisms such as Bastille Linux or SELinux can help automate Linux hardening, but we recommend you follow a more structured approach, such as those from the Center for Internet Security or the Defense Information Systems Agency's Security Technical Implementation Guide for Linux. These guidelines have been reviewed and tested by thousands of people and are unlikely to cause problems like incompatibility.
It's important to note that when it comes to Hadoop and MongoDB, properly configuring file system permissions is vital. The Hadoop Distributed File System can be securely configured to give only appropriate permissions to users running various jobs. For example, we recommend splitting MapReduce jobs and HDFS users into two groups, so that you have separation of access. HDFS needs to run NameNode, DataNode, and Secondary NameNode, but MapReduce users need to run only the JobTracker and TaskTracker applications. Creating Hadoop groups allows you to set up permissions, a critical part of any system-hardening process. Without the proper permissions, a user could potentially copy the entire Hadoop or MongoDB instance, load it on a new server, and bypass all of your authentication controls; this is also an argument in favor of encryption, as we discuss in our full report.
Finally, don't run these databases as root. We have seen too many instances of this. Create a separate user, and lock down that user so the database has access to only those directories and executables it needs.
Right now, open source NoSQL technologies just aren't ready for the enterprise when it comes to security. Can you make them ready? Sure, but it comes down to resources--do you have people with the right skills? If so and if you're willing to work closely with developers and analyze your organization's risk, you can implement NoSQL technologies securely. Otherwise, there are commercial NoSQL databases such as Vertica and eXist-db that have security controls built in. Just because some well-known Web 2.0 company uses an open source database doesn't mean you should. Their risks, data, and expertise are likely very different from yours.
We're not trying to paint the future of big data and NoSQL as that of a security wasteland. There's precedent for a free-for-all market getting serious under pressure. We saw this happen in the public cloud, as enterprises forced providers to start caring about security controls and privacy. But the fact is that NoSQL technology is by developers, for developers. Unless companies make data protection a priority--and vote with their budget dollars--we don't foresee the NoSQL community suddenly getting security religion.