Yet realizing this potential has been challenging for many mainstream enterprises. Many started experimenting with some of the 13 functional modules that make up Apache Hadoop, a set of technologies that required large teams and several years for the early wave of Hadoop adopters such as eBay, Facebook and Yahoo to master.
The first wave of Hadoop technology, the 1.x generation, was not easy to deploy nor easy to manage. The many moving parts that make up a Hadoop cluster were difficult to configure for new users. Seemingly minor details – patch versioning, for instance -- mattered a lot. As a result, services failed more often than expected, and many problems only showed up under severe load. Skills were and still are in short supply, although there is no shortage of good training available from leading vendors such as Hortonworks and Cloudera.
[ Hortonworks gives the low-down on modern-day Hadoop. Read Hadoop According To Hortonworks: An Insider's View. ]
Fortunately, the second generation of Hadoop, which Hortonworks calls HDP 2.0 and which was announced at Hadoop Summit 2013, fills in many of the gaps. Manageability is a key expectation, particularly for the more business-critical use cases that service providers experience. Hadoop has made great advances here with Ambari, an intuitive Web user interface that makes it much easier to provision, manage and monitor Hadoop clusters. Ambari allows the automation of initial installation, rolling upgrades without service disruption, high availability and disaster recovery, all critical to efficient IT operations.
Moreover, the independent software vendor ecosystem that supports Hadoop distributions is broadening and deepening. This is important for two reasons. In our experience, much of a buying decision boils down to how Hadoop fits with existing technology assets; in most cases, that means traditional business intelligence and data warehouse vendors. This also alleviates concerns over the skills shortage.