Big Data // Big Data Analytics
Commentary
7/9/2013
12:48 PM
Todd Homa
Todd Homa
Commentary
Connect Directly
RSS
E-Mail
50%
50%
Repost This

When Big Data Equals Big Money Waste

Plan carefully and involve business stakeholders early to avoid burning time and money by locking up big data in expensive, impenetrable data vaults.

5 Big Wishes For Big Data Deployments
5 Big Wishes For Big Data Deployments
(click image for larger view and for slideshow)

No one would ever accuse Netflix of being a tech laggard. Netflix prides itself on seeing the future before it arrives and getting a jump on the competition. So Netflix recently moved to Apache Cassandra, a NoSQL database, and Hadoop, a classic big data play.

But because Cassandra could not easily be married to Netflix's existing analytics and reporting platforms, Netflix discovered it needed to develop an offline process to extract all that big data. Otherwise, its shiny new database would become a data vault.

However, "We soon discovered that while [the offline process] may be feasible for one or two clusters, maintaining the number of moving parts required to deploy this solution to all of our production clusters was going to quickly become unmaintainable," Charles Smith, Netflix's senior software engineer for big data, and Jeff Magnusson, the company's manager of data science platform architecture, wrote on the Netflix Tech Blog.

[ How big data can save lives. Read Big Data Project Analyzes Veterans' Suicide Risk. ]

To solve the problem, Netflix engineers created an application to reduce the number of moving parts and increase the speed with which the data could be analyzed. It was not a trivial undertaking. It took time; it cost money. Now Netflix can scale in the cloud as the size of its data warehouse grows.

This is just one example of how a big data project can deliver unpleasant surprises downstream, and how difficult and expensive these challenges can be for organizations to overcome.

Look Before You Leap

Many companies are feeling competitive pressure to cope with fast-growing data varieties, volumes and velocities. They're making substantial investments to leverage the flood. But unless these investments are carefully planned, and the organizational impact of the computing changes considered, the business results likely will be disappointing.

As Netflix and other organizations have already discovered, moving to NoSQL platforms can result in vast amounts of information that ends up locked in data vaults, formatted in ways that cannot easily be queried or analyzed.

Fortunately, this problem can be avoided with high levels of communication among various organizational functions, and the setting of clear and broadly communicated business and technical requirements. Most importantly, all big data initiatives should begin by reaching out to all downstream information users.

Big Data, Big Risks

When a major communications provider switched from an older Oracle RDBMS to Apache Cassandra, it neglected to speak with downstream stakeholders who would be using the information collected. As a result, after the system was built and implemented, the company discovered that key information could not be queried. Again, a company had to build a highly customized solution, which required additional time and funding.

Todd Homa is a Data Architect at CapTech Consulting with over 17 years experience helping clients design and implement complex data solutions.

Harlan Bennett is a Senior Consultant at CapTech Consulting with over 10 years experience in business systems analysis, enterprise architecture, and strategy.

Previous
1 of 2
Next
Comment  | 
Print  | 
More Insights
InformationWeek Elite 100
InformationWeek Elite 100
Our data shows these innovators using digital technology in two key areas: providing better products and cutting costs. Almost half of them expect to introduce a new IT-led product this year, and 46% are using technology to make business processes more efficient.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Government, May 2014
Protecting Critical Infrastructure: A New Approach NIST's cyber-security framework gives critical-infrastructure operators a new tool to assess readiness. But will operators put this voluntary framework to work?
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.