A Glimpse Of Google - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Business & Finance
05:52 PM
Connect Directly

A Glimpse Of Google

Google engineering exec gives EclipseCon attendees a peek into the search-technology company's workings.

Google Inc. owes its success in part to the unique computing architecture it invented to suit its business--serving up searches of the Web to millions of users with lightning-fast speed.

Attendees at the EclipseCon conference in Burlingame, Calif., Wednesday were treated to a glimpse of how that technology was developed in Google's early days and how that technology operates today, courtesy of a speech by Urs Hoelzle, Google's VP of engineering.

EclipseCon is the second annual meeting of users of the Eclipse open-source programmer's workbench, a platform that helps unrelated software-development tools work together.

To invent Google's technology, developers had to throw out assumptions previously used in large data centers and implant new ones, Hoelzle told attendees. And because exactly what will be searched for on any given day is never predictable, keeping the 10 billion pages of the Web close at hand is a daunting challenge.

Hoelzle, a man with a short black beard and quick wit, told audience members that a Google search on "Eclipse" produced more results for solar events and the Mitsubishi car than their favorite development environment. "So everyone here, get back to work," he admonished his listeners, who chuckled in response.

Hoelzle showed pictures of the early Google hardware data center, which consisted of two desktop machines "that no one else was using" in a cluttered setting at Stanford University in 1997. By 1999, it was a large set of thin, rack-mounted Intel servers with a maze of cables coming out the back. By 2000, it was a much cleaner set of 1,000 dual-processor servers in racks that incorporated switching to eliminate the cables.

"The underlying hardware is pretty darn cheap, but achieving scalability has many different aspects," Hoelzle said.

Ensuring reliability was another concern. With so many commodity hardware servers, "expect to lose one a day," he said. Google decided to "try to deal with that in an automated way. Otherwise, you will have lots of people running around trying to restart servers."

Hoelzle then flashed a picture on the screen of six fire trucks at a Google data center. "I can't tell you what happened, but it's not about one machine going down," he said. He didn't disclose when the incident occurred. "No users were harmed in this picture," he added.

To cope with outages of a variable nature, Google built the Google File System, which was closely geared to Google's search computing tasks and had a high tolerance for server failures.

Google operations are built around large files that are broken down into 64-Mbyte chunks and scattered across multiple "chunk servers." A description of each file, its number of chunks, and chunk locations are kept on a master server. Each 64-Mbyte chunk is also replicated on two other servers, so a total of three copies are kept with the path to each retained by the master server.

By scattering its files across many Red Hat Linux servers, Google gains reliability at a low cost. The master server regularly polls chunk servers with a heartbeat message, asking if they're alive. If it fails to get an answer, or if a quick check of the contents on a server indicates that its data has been corrupted, the master server sets about creating a new 64-Mbyte chunk on another server, "usually in a matter of minutes," Hoelzle explained.

Google suffers the loss of a file only if all three copies of the chunk, each on a different server, are lost simultaneously, Hoelzle said. Such a loss would require a lengthy rebuilding process by collecting replacement data off the Web.

Google has indexed the Web through its many Web crawlers that send back summaries of the Web sites they find. Building an index of the Web is a large task that "takes several days on hundreds of machines," he said. The index is renewed constantly.

To search the index quickly, Google breaks it "into pieces called shards," scattered across servers so they may be searched in parallel, each server coming up with part of the answer to a question and feeding it back for aggregated results.

Google's file system, indexing technology, and grid of commodity servers allow it to achieve search times of a quarter of a second on a typical query. The replication and constant heartbeat messaging built into the file system gives it high reliability and availability, he noted.

In addition, as Google servers parse queries, they break them down into smaller tasks and make one trip to the database for a result that may satisfy many users. The process is called "map reduction." Hoelzle said Google once "lost 1,800 of 2,000 map-reduction machines in a large-scale maintenance incident." Because of the load balancing built into the system, Google still completed all queries by steering uncompleted tasks to the machines that showed they had processing power.

"You want to split that huge task into many small tasks, spread across many machines," Hoelzle said. This architecture "makes failure recovery easy. If worker W dies, re-execute the tasks done by that worker elsewhere."

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
2021 State of ITOps and SecOps Report
2021 State of ITOps and SecOps Report
This new report from InformationWeek explores what we've learned over the past year, critical trends around ITOps and SecOps, and where leaders are focusing their time and efforts to support a growing digital economy. Download it today!
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

IT Leadership: 10 Ways to Unleash Enterprise Innovation
Lisa Morgan, Freelance Writer,  6/8/2021
Preparing for the Upcoming Quantum Computing Revolution
John Edwards, Technology Journalist & Author,  6/3/2021
How SolarWinds Changed Cybersecurity Leadership's Priorities
Jessica Davis, Senior Editor, Enterprise Apps,  5/26/2021
Register for InformationWeek Newsletters
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
White Papers
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Sponsored Video
Flash Poll