Welcome Guest. | Log In| Register | Membership Benefits
  • Email this page E-mail
  • |  Print Print
  • |   Bookmark and Share
  • icon

The Self-Organizing Web


Research scientist Gary Flake discovered that the Internet is organized by the same principles as the earth's biosphere, called power law distribution.



Research scientist Gary Flake of the NEC Research Institute in Princeton was trying to track down Michael Jordan using the Internet. No, not that Michael Jordan--and that problem became the mother of his invention: a search algorithm with great promise for improving how businesses and individuals use the Internet.

The Michael Jordan he was looking for is a computer scientist; when Flake tried to track him down, he was hard to find. Eventually, by typing in parts of Jordan's bio--his title, published papers--he found Jordan's home page. It got Flake thinking: "Wouldn't it be nice to bias my search to science, not the [entire] Web?"

Flake reasoned that it shouldn't be too difficult to accomplish the goal, once solved the problem of homonyms--similar words that have different meanings, such as Pole and pole--or Michael Jordan, and, well, Michael Jordan. He soon discovered that key words weren't as accurate as seeking relationships among sites by following their inbound and outbound links.

One evening he came up with an algorithm, did the analysis, and discovered that to a large extent, the Web is self-organizing. Even more fascinating, it aligns with the key principles of how the earth's biosphere is organized, called power law distribution. In nature, it works like this: There are relatively few large creatures (such as whales); the smaller creatures get (e.g., felines, then spiders, then bacteria), the progressively greater their diversity and number.

On the Internet, there are relatively few powerhouses, such as Yahoo and America Online. For smaller sites, their number goes up as their size goes down. To wit: there's a relative handful of large entities such as CNN and CBS; a fair number of corporate sites; and still more sites such as celebrity and fan pages. The smallest sites and the greatest in number are individual home pages.

Using this new algorithm, searches can be improved by ignoring page content and studying links alone. Business applications are on the horizon, Flake says. Among the first, NEC hopes to build a better porn filter. The best of such filters, Flake says, "are abysmal," because they rely on key words instead of the relationships among sites. This would effectively take care of the problem faced by carpenters who're looking up the term "screw" or women with cancer who look up "breast."

Will this technology replace current search engines? Absolutely not, says Flake, whose favorite search engine remains Google. "It will always have its place."


Subscribe to RSS


Advertisement






Get InformationWeek in Print

Apply for a free 52-week subscription to InformationWeek (a $199 value)



NOTE: Offer valid for U.S., U.S. possessions, & Canada only.