Plan carefully and involve business stakeholders early to avoid burning time and money by locking up big data in expensive, impenetrable data vaults.
There are other risks. Because most of today's big data tools are based on open-source software, provided by nonprofit organizations that often are not much more than teams of volunteers, versions of important tools often fall behind. Conversely, updates can arrive almost too frequently. "Fixing as you go" is part of the open-source ethos. Buggy open-source software is commonplace. There are companies -- for instance, Cloudera and DataStax – attempting to fill this gap, but their support services can be expensive.
Another issue is the difficulty of finding and hiring big data administrators, developers, and architects with hands-on experience. And even when qualified candidates can be located, hiring and retaining them can be a challenge. Google, Amazon, and other top technology companies offer compensation packages that other organizations find difficult to match.
Here's The Plan
Organizations eager to adopt big data tools should begin by asking questions. A good one to start with is: "What are my business requirements?" That means identifying all information stakeholders. The way to do that is to ask, "What are our current -- and future -- information needs?" All of the people your business must turn to in order to get an answer to that question are your stakeholders.
After that, an organization will be ready to consider its options. It could:
-- Stay with a relational database management system (RDBMS) unless there are good operational reasons not to.
-- Design a big data platform database with data retrieval in mind for the future, especially if retrieval needs are minor.
-- Move to a big data platform and build an overlay, or bridge, that will continually extract the required data to a populated system that will support the organization's analytics and its downstream consumers of the data.
Big data tools might make sense in situations where current relational database management software does not perform well; where data volumes are very large; where the data needed is unstructured; or where consistency, availability or partitioning can be sacrificed for the benefit of the first two conditions.
But organizations should not invest in big data platforms when the current RDBMS is performing well, or when the company uses only relatively small volumes of data and that data is highly structured.
Finally, big data is not at present a viable solution when the organization's data environment demands high levels of consistency; for example, in finance, the military, and healthcare.
The Hybrid Solution
Hybrid solutions can strike a reasonable compromise. These solutions involve adopting a big data platform while keeping core RDBMS systems running. In this scenario, organizations can implement a bridge component that continually extracts data from the big data platform and feeds it to the RDMBS. The big data platform typically stores the detailed data while the traditional RDBMS systems aggregate and transform it for consumption.
For example, one major communications provider is pulling data from Cassandra, loading it into Hadoop, but then sending it to an Oracle data warehouse. The company's business intelligence tools operate with the relational data and systems. This hybrid gives the company two options: Hadoop for very large, non-aggregated data sets and data-analysis flexibility, and Oracle for targeted and previously identified needs. It also limits the risk of depending on open-source software, and reduces the organization's dependency on those hard-to-find, hard-to-keep and expensive workers with sophisticated big data skills.
Mere novelty is never a good reason to implement any business technology, and jumping on the bandwagon without proper preparation can be disastrous. Although switching from an RDMS to a big data platform might sound like a technical decision best made by technologists, it's actually a decision best made by the entire company.
The costs and risks of doing otherwise are simply too big to ignore.
Todd Homa is a Data Architect at CapTech Consulting with over 17 years experience helping clients design and implement complex data solutions.
Harlan Bennett is a Senior Consultant at CapTech Consulting with over 10 years experience in business systems analysis, enterprise architecture, and strategy.
To understand how to secure big data, you have to understand what it is -- and what it isn't. In the Security Implications Of Big Data Strategies report, we show you how to alter your security strategy to accommodate big data -- and when not to. (Free registration required.)
How Enterprises Are Attacking the IT Security EnterpriseTo learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
IT Strategies to Conquer the CloudChances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.