Yahoo Claims Record With Petabyte Database - InformationWeek
Software // Information Management
08:04 PM
Connect Directly

Yahoo Claims Record With Petabyte Database

Yahoo claims it has the largest SQL database in a production environment and that it will grow larger.

The result is a database made possible by both hardware and software innovations. For example, SQL databases are organized as tables, which consist of rows and columns. They are traditionally arranged as rows of data, but Yahoo chose to store its data as distributed columns.

"What we chose to do is organize it as columns," said Hasan. "What that enables, especially with deep analytics queries, is that you can go to only the data that interests you, which makes it very, very effective in terms reducing the amount of data you have to move through for a particular query."

Yahoo is also using advanced techniques for data compression and parallel vector query processing, a method for using parallel processing more efficiently.

Google's BigTable database also uses commodity hardware clusters, but Hasan said that Yahoo's approach differs in that it is designed for an SQL interface. "What that enables is that you can write your programs very, very cheaply," said Hasan. "Typically with BigTable, you'd be writing a C++ or a Java program. Whereas what we can do is get the same job done with SQL, which is much more productive from a programming perspective."

The reason Yahoo developed its database was that commercial database providers just couldn't meet its needs. Hasan said that the commercial vendors did pretty well up to about 25 terabytes, and could even manage up to 100 terabytes. "Our needs are about 100 times higher than that," he said. "The other part we ran into was if you look at the cost, even at 100 terabytes, our engine is roughly 10 and 20 times more cost effective. That's because we were able to build in specializations for our needs."

Yahoo's data needs are substantial. According to Hasan, the travel industry's Sabre system handles 50 million events per day, credit card company Visa handles 120 million events a day, and the New York Stock Exchange has handled over 225 million events in a day. Yahoo, he said, handles 24 billion events a day, fully two orders of magnitude more than other non-Internet companies.

2 of 2
Comment  | 
Print  | 
More Insights
Threaded  |  Newest First  |  Oldest First
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of the Cloud Report
As the use of public cloud becomes a given, IT leaders must navigate the transition and advocate for management tools or architectures that allow them to realize the benefits they seek. Download this report to explore the issues and how to best leverage the cloud moving forward.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on for the week of November 6, 2016. We'll be talking with the editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll