Google Inc. owes its success in part to the unique computing architecture it invented to suit its business--serving up searches of the Web to millions of users with lightning-fast speed.
Attendees at the EclipseCon conference in Burlingame, Calif., Wednesday were treated to a glimpse of how that technology was developed in Google's early days and how that technology operates today, courtesy of a speech by Urs Hoelzle, Google's VP of engineering.
EclipseCon is the second annual meeting of users of the Eclipse open-source programmer's workbench, a platform that helps unrelated software-development tools work together.
To invent Google's technology, developers had to throw out assumptions previously used in large data centers and implant new ones, Hoelzle told attendees. And because exactly what will be searched for on any given day is never predictable, keeping the 10 billion pages of the Web close at hand is a daunting challenge.
Hoelzle, a man with a short black beard and quick wit, told audience members that a Google search on "Eclipse" produced more results for solar events and the Mitsubishi car than their favorite development environment. "So everyone here, get back to work," he admonished his listeners, who chuckled in response.
Hoelzle showed pictures of the early Google hardware data center, which consisted of two desktop machines "that no one else was using" in a cluttered setting at Stanford University in 1997. By 1999, it was a large set of thin, rack-mounted Intel servers with a maze of cables coming out the back. By 2000, it was a much cleaner set of 1,000 dual-processor servers in racks that incorporated switching to eliminate the cables.
"The underlying hardware is pretty darn cheap, but achieving scalability has many different aspects," Hoelzle said.
Ensuring reliability was another concern. With so many commodity hardware servers, "expect to lose one a day," he said. Google decided to "try to deal with that in an automated way. Otherwise, you will have lots of people running around trying to restart servers."
Hoelzle then flashed a picture on the screen of six fire trucks at a Google data center. "I can't tell you what happened, but it's not about one machine going down," he said. He didn't disclose when the incident occurred. "No users were harmed in this picture," he added.
To cope with outages of a variable nature, Google built the Google File System, which was closely geared to Google's search computing tasks and had a high tolerance for server failures.
Google operations are built around large files that are broken down into 64-Mbyte chunks and scattered across multiple "chunk servers." A description of each file, its number of chunks, and chunk locations are kept on a master server. Each 64-Mbyte chunk is also replicated on two other servers, so a total of three copies are kept with the path to each retained by the master server.
By scattering its files across many Red Hat Linux servers, Google gains reliability at a low cost. The master server regularly polls chunk servers with a heartbeat message, asking if they're alive. If it fails to get an answer, or if a quick check of the contents on a server indicates that its data has been corrupted, the master server sets about creating a new 64-Mbyte chunk on another server, "usually in a matter of minutes," Hoelzle explained.
Google suffers the loss of a file only if all three copies of the chunk, each on a different server, are lost simultaneously, Hoelzle said. Such a loss would require a lengthy rebuilding process by collecting replacement data off the Web.
Google has indexed the Web through its many Web crawlers that send back summaries of the Web sites they find. Building an index of the Web is a large task that "takes several days on hundreds of machines," he said. The index is renewed constantly.
To search the index quickly, Google breaks it "into pieces called shards," scattered across servers so they may be searched in parallel, each server coming up with part of the answer to a question and feeding it back for aggregated results.
Google's file system, indexing technology, and grid of commodity servers allow it to achieve search times of a quarter of a second on a typical query. The replication and constant heartbeat messaging built into the file system gives it high reliability and availability, he noted.
In addition, as Google servers parse queries, they break them down into smaller tasks and make one trip to the database for a result that may satisfy many users. The process is called "map reduction." Hoelzle said Google once "lost 1,800 of 2,000 map-reduction machines in a large-scale maintenance incident." Because of the load balancing built into the system, Google still completed all queries by steering uncompleted tasks to the machines that showed they had processing power.
"You want to split that huge task into many small tasks, spread across many machines," Hoelzle said. This architecture "makes failure recovery easy. If worker W dies, re-execute the tasks done by that worker elsewhere."