The answer is a cluster-based analysis system, sometimes referred to loosely as a cloud database system. At the Cloud Computing Conference and Expo Nov. 3 in Santa Clara, Calif., representatives of Yahoo explained how they use Hadoop open source software, from the Apache Software Foundation, to analyze the Web.
Hadoop builds Yahoo's indexes of the Web that power the Yahoo search engine. Its Web mapping system "runs in 73 hours, taking as input, data from all the Web pages in the world," he said. Yahoo's digest of Web pages consists of 300 terabytes of data. Hadoop analysis tells Yahoo's ad system what ads to serve to visitors, based on their profile from searches they've conducted on the site.
It's use of Hadoop keeps it running on a total of 25,000 servers at the company, he said. Yahoo distributes its tested, production version of Hadoop for free, Baldeschweiler said.
Another speaker at the conference was Christophe Brisciglia, a former Google engineer and now part of the founding team at Cloudera, a firm that is producing a supported enterprise distribution of Hadoop. "Cloudera is to Hadoop as Red Hat is to Linux," he said.
Brisciglia described Hadoop as "a batch data processing system" for use on clusters of commodity hardware. Unlike relational database, "in Hadoop there is no structure (to the data). You can dump incredibly large amounts of data into a Hadoop cluster and figure out what to do with it later."
Page 2:
![]()
1
|
2
Next Page »
Stay connected and informed by visiting the CA Solutions Center Community!

Become a member today for instant access to free InformationWeek research, expert advice, peer perspectives, and more on the following topics:
- Application Performance Management (APM)
- Security Management
- Mainframe 2.0
- IT Automation
- Service Assurance
Also, visit our Government and Financial Services groups to see how these technologies apply specifically to those industries.
NOTE: Offer valid for U.S., U.S. possessions, & Canada only.