IBM Picks Hadoop To Analyze Large Data Volumes - InformationWeek
Software // Information Management
08:46 PM
Connect Directly

IBM Picks Hadoop To Analyze Large Data Volumes

Big Blue unveiled a package of services and analytics called BigInsights based on Apache's open source Hadoop.

With Big Blue behind Hadoop, companies with Big Data problems may find the open source technology is available in more manageable forms.

IBM, the originator of the SQL data access language, has recognized the NoSQL movement has a point. Some data management problems don't lend themselves to being solved by IBM's DB2 or other relational database systems.

That's why it's started offering consulting services on managing large volumes of data based on Apache's open source Hadoop. It has a package of services and Hadoop-based analytics that it calls BigInsights Core to enable companies to take the plunge in Internet-scale data volumes. It's also offering its own large volume, data management software, IBM BigSheets, using a large scale spreadsheet paradigm.

"Hadoop opens up a broader technology domain -- Big Data," said Bernie Spang, IBM director of information management product strategy, referring to the common appellation for masses of website, customer or RSS feed or Twitter message data, all of value to the business.

Hadoop makes no pretence of running transactions or functioning like a transaction-processing database system, with its stringent requirements for a two-phase commit. Rather, it specializes in filtering, sorting and managing either structured or unstructured data on a very large scale. After Hadoop has done its work, it's possible for data warehouses, business analytics systems and relational databases to work with a more manageable results set.

IBM made the announcement of Hadoop services at its Information on Demand conference in Rome May 19. These are not services from consultants at IBM Global Information Services but advisors from IBM labs and engineering. IBM is exploring how to help customers get a handle of information flows that can be measured in the petabytes, as opposed to mere megabytes, gigabyte and terabytes.

Hadoop is a combination of two distributed systems meant to filter and manage data on a large server cluster. One part is Map/Reduce, a system that knows where data is stored on disks throughout the cluster and where the nearest processor to it is. When it comes time to sort or filter the data, it can give the orders to call up the data from disk in large chunks of 64 or 128 megabytes and move it to nearby processors. The second part is the HDFS or Hadoop File System that knows how to distribute the data across a cluster in the first place.

Likewise, IBM said BigSheets, a browser-based data extraction, annotation and visualization system, was available May 19 in technology preview form. There is no date yet for when it will be a generally available, finished product, Spang said in an interview.

BigSheets was first announced Feb. 25. It is based on several open source components, including Hadoop; Nutch, a Web search and search indexing engine; and PIG, a high level language under development at Yahoo! for composing work that will be executed by Hadoop.

A key Hadoop brain trust, the start-up firm Cloudera, says IBM's entry into the Big Data field will have an impact on getting Hadoop adopted in the enterprise. "At Cloudera, we've seen incredible Hadoop uptake in mainstream enterprises… I see no end to the number of applications of this new technology. IBM's entry means more open source contributors will help expand the horizons for Hadoop," said Doug Cutting, Cloudera software architect and original author of Hadoop, while at Yahoo! He made the comments in an email message.

"We're confident the time is right for Hadoop to move into established IT infrastructure. IBM's contributions should accelerate this movement," added Mike Olson, CEO of Cloudera, in a message.

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of the Cloud Report
As the use of public cloud becomes a given, IT leaders must navigate the transition and advocate for management tools or architectures that allow them to realize the benefits they seek. Download this report to explore the issues and how to best leverage the cloud moving forward.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on for the week of November 6, 2016. We'll be talking with the editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll