Hadoop is not a Data Integration Solution," I will describe the gaps between Hadoop and a proper Data Integration. To be sure, there are many, many gaps in Hadoop when compared to a traditional data integration solution. But, what is it about the Hadoop infrastructure that is attracting such interest despite these significant gaps? There is a reason Sears has made the decisions it has. There is a reason why many more organizations are aggressively pushing forward to integrate data in Hadoop despite Hadoop's functional gaps.
In the era of Big Data, Hadoop's architecture is fundamentally superior for supporting many of the most commonly deployed data integration functions. First and foremost, it can deliver the scale and compute capabilities required to support the information the business demands at a cost that is sustainable. For this reason, organizations are flocking to Hadoop even if key functional capabilities must be written by hand today. Hadoop makes it easy to scale computing power horizontally with low cost components. This architectural benefit is absolutely core to successfully performing the large-scale ETL required for processing Big Data. Hadoop's ability to persist data „Ÿ lots of it in any format – is a new architectural component long missing from traditional data integration platforms. More importantly, this architecture looks like it will also support a broader range of data integration functions.
The compute and analysis capabilities of the Hadoop architecture support the requirements of data profiling and data quality. In many ways, data profiling and quality are Big Data problems, particularly with today's growing data sets. This is being tested in our ETL Solution at NSFAS, why profile a sample when I have the entire dataset? The ability to support metadata seems obvious and while HCatalog is immature, it is evolving. Witness the introduction of Navigator 1.0 in Cloudera's 4.2 release, which provides basic data governance capabilities. Not only does the core architecture support advanced data integration functionality, but it also offers a superior framework to do so, enabling vendors to deliver these features at a rapid pace.
The main problem Big Data creates is an architectural one, not a functional one. Perhaps it is fair to say that today, Hadoop is not a Data Integration solution