Today's headline on Hadoop
provides yet more evidence that big data is this year's hot topic in the world of information management, business intelligence and analytics. If you're involved in data warehousing, you surely know what Hadoop is by now. In case you've been under a rock, it's Java-based software framework for distributed processing of data-intensive transformations and analyses. The software automatically distributes processing of up to petabytes of data across thousands of nodes on low-cost, commodity hardware. Offering cost and speed advantages, Hadoop is a fast-growing choice for handling large-scale-data (as in petabytes) as well as complex-data (as in XML, e-mail, images and more) and mixed-data (as in structured mashed with semi-structured data types).
I was quite impressed by the big crowd and mainstream-enterprise types attending last year's Hadoop World NYC
conference. But as Hadoop-co-founder-turned-Cloudera-executive Doug Cutting
and many others acknowledged at the event, much work had yet to be done to turn Hadoop into a familiar and easy-to-use environment. Today's story
illustrates progress being made by Talend and Quest (in terms of technology) as well as by IBM (more symbolically).
Of course, Hadoop is old news at Yahoo (where Cutting used to work). Check out Yahoo's Hadoop implementation in the image gallery at right.
Hadoop is not a place for conventional BI, reporting and analysis. For that, Hadoop users are bringing winnowed-down result sets into more accustomed data warehousing environments. There, too, times have changed, and new deployments are likely to be in one of the data warehousing appliances or configurations you'll see in the image gallery at left.
Enjoy the pictures (and the long weekend), and by all means share you insights and experiences with Hadoop or data warehouse appliance environments through the commenting function below.Hadoop is a fast-growing choice for handling large-scale-data (as in petabytes) as well as complex-data (as in XML, e-mail, images and more) and mixed-data (as in structured mashed with semi-structured data types).