Big Data Debate: End Near For ETL?
Threaded  |  Newest First  |  Oldest First
User Rank: Apprentice
1/22/2014 | 7:26:59 AM

Hadoop is not a Data Integration Solution," I will describe the gaps between Hadoop and a proper Data Integration. To be sure, there are many, many gaps in Hadoop when compared to a traditional data integration solution. But, what is it about the Hadoop infrastructure that is attracting such interest despite these significant gaps? There is a reason Sears has made the decisions it has. There is a reason why many more organizations are aggressively pushing forward to integrate data in Hadoop despite Hadoop's functional gaps.


In the era of Big Data, Hadoop's architecture is fundamentally superior for supporting many of the most commonly deployed data integration functions. First and foremost, it can deliver the scale and compute capabilities required to support the information the business demands at a cost that is sustainable. For this reason, organizations are flocking to Hadoop even if key functional capabilities must be written by hand today. Hadoop makes it easy to scale computing power horizontally with low cost components. This architectural benefit is absolutely core to successfully performing the large-scale ETL required for processing Big Data. Hadoop's ability to persist data äč lots of it in any format – is a new architectural component long missing from traditional data integration platforms. More importantly, this architecture looks like it will also support a broader range of data integration functions.


The compute and analysis capabilities of the Hadoop architecture support the requirements of data profiling and data quality. In many ways, data profiling and quality are Big Data problems, particularly with today's growing data sets. This is being tested in our ETL Solution at NSFAS, why profile a sample when I have the entire dataset? The ability to support metadata seems obvious and while HCatalog is immature, it is evolving. Witness the introduction of Navigator 1.0 in Cloudera's 4.2 release, which provides basic data governance capabilities. Not only does the core architecture support advanced data integration functionality, but it also offers a superior framework to do so, enabling vendors to deliver these features at a rapid pace.


The main problem Big Data creates is an architectural one, not a functional one. Perhaps it is fair to say that today, Hadoop is not a Data Integration solution

User Rank: Apprentice
11/4/2014 | 6:52:37 PM
Excellent point,icokruger
User Rank: Apprentice
7/11/2014 | 12:10:52 PM
My easier choice ...
Further to this thread and between/before Hadoop and traditional DW is IRI CoSort (, which doesn't rely on more machines or big ETL package costs. It combines transforms in the file system on huge file and relational sources, and addresses semi- and un-structured data. Using an Eclipse front-end, it also addresses the issues Informatica's CTO correctly identifies. ELT isn't the way to go either, since big transforms tax the DB (thus query response), or requires a costly appliance (which like Hadoop, throws hardware at a software problem). The benefits of a paradigm shift to new IT fabric are not always worth the risk.

Register for InformationWeek Newsletters
White Papers
Current Issue
Top IT Trends to Watch in Financial Services
IT pros at banks, investment houses, insurance companies, and other financial services organizations are focused on a range of issues, from peer-to-peer lending to cybersecurity to performance, agility, and compliance. It all matters.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on for the week of October 9, 2016. We'll be talking with the editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll