Big Data Debate: End Near For ETL? - InformationWeek
Big Data Debate: End Near For ETL?
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
11/4/2014 | 6:52:37 PM
Excellent point,icokruger
User Rank: Apprentice
7/11/2014 | 12:10:52 PM
My easier choice ...
Further to this thread and between/before Hadoop and traditional DW is IRI CoSort (, which doesn't rely on more machines or big ETL package costs. It combines transforms in the file system on huge file and relational sources, and addresses semi- and un-structured data. Using an Eclipse front-end, it also addresses the issues Informatica's CTO correctly identifies. ELT isn't the way to go either, since big transforms tax the DB (thus query response), or requires a costly appliance (which like Hadoop, throws hardware at a software problem). The benefits of a paradigm shift to new IT fabric are not always worth the risk.
User Rank: Apprentice
1/22/2014 | 7:26:59 AM

Hadoop is not a Data Integration Solution," I will describe the gaps between Hadoop and a proper Data Integration. To be sure, there are many, many gaps in Hadoop when compared to a traditional data integration solution. But, what is it about the Hadoop infrastructure that is attracting such interest despite these significant gaps? There is a reason Sears has made the decisions it has. There is a reason why many more organizations are aggressively pushing forward to integrate data in Hadoop despite Hadoop's functional gaps.


In the era of Big Data, Hadoop's architecture is fundamentally superior for supporting many of the most commonly deployed data integration functions. First and foremost, it can deliver the scale and compute capabilities required to support the information the business demands at a cost that is sustainable. For this reason, organizations are flocking to Hadoop even if key functional capabilities must be written by hand today. Hadoop makes it easy to scale computing power horizontally with low cost components. This architectural benefit is absolutely core to successfully performing the large-scale ETL required for processing Big Data. Hadoop's ability to persist data äč lots of it in any format – is a new architectural component long missing from traditional data integration platforms. More importantly, this architecture looks like it will also support a broader range of data integration functions.


The compute and analysis capabilities of the Hadoop architecture support the requirements of data profiling and data quality. In many ways, data profiling and quality are Big Data problems, particularly with today's growing data sets. This is being tested in our ETL Solution at NSFAS, why profile a sample when I have the entire dataset? The ability to support metadata seems obvious and while HCatalog is immature, it is evolving. Witness the introduction of Navigator 1.0 in Cloudera's 4.2 release, which provides basic data governance capabilities. Not only does the core architecture support advanced data integration functionality, but it also offers a superior framework to do so, enabling vendors to deliver these features at a rapid pace.


The main problem Big Data creates is an architectural one, not a functional one. Perhaps it is fair to say that today, Hadoop is not a Data Integration solution

How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of the Cloud Report
As the use of public cloud becomes a given, IT leaders must navigate the transition and advocate for management tools or architectures that allow them to realize the benefits they seek. Download this report to explore the issues and how to best leverage the cloud moving forward.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on for the week of November 6, 2016. We'll be talking with the editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll