But what works for finding things on the Web--the keyword search engines offered by America Online, Google, Microsoft, Yahoo, and others--often fails inside companies' networks, which contain not only familiar Web pages, Office documents, and Adobe files, but more obscure data that lives in specialized mainframe databases or CAD systems, some of which date back decades.
"We have very good systems for counting how often keywords appear on pages and using that to rank documents," says Christopher Manning, assistant professor of computer science and linguistics at Stanford University. But search algorithms still can't glean the same context that people can with just a glance. "Human beings can get an enormous amount of information by looking at a three-line snippet of a page that somehow computers aren't getting," he says. New research aims to help computers understand what they're missing.
The first place most PC users still look for search results is Google or Yahoo. Their keyword search model works well on the Internet because it uses the links people build between Web pages as votes for relevance. It also can be used across corporate networks of computers, even though business documents aren't linked together like Web pages. The problem is, having become conditioned to Google's simplicity and speed, most people expect the most relevant information right away, with minimal effort. Inside business networks, that equation doesn't always compute.
Google's answer so far is a special server "appliance" for companies that can index their data and expose it via Google's familiar user interface. In response to competitors who say the company's bread-and-butter PageRank algorithm doesn't work as well for data as for Web documents, Google Enterprise general manager Dave Girouard says PageRank relies on more than 100 variables to decide what's relevant, and only one of those measures link structure. For businesses that buy the search appliance, other variables are given more weight. That means Google can serve both the mass market of PC users and enterprise customers who buy its data-searching servers. "A lot of people dismiss it entirely," Girouard says of PageRank, "but it is certainly of value."
Google is working on algorithms that can analyze audio files and video clips. It's also refining software that can sort data from different IT systems into easy-to-understand categories, a technique used on its Google News site.
There's certainly no shortage of data for businesses to reckon with. According to a 2003 study by the University of California, Berkeley's computer-science school, the volume of data on the Web tripled between 2000 and 2003, from less than 50 terabytes to 167 terabytes. In 2002, print, film, magnetic, and optical storage media yielded about 5 quintillion (that's 5 times 10 to the 18th power) bytes of new data, 37,000 times the amount of information in the Library of Congress. The trend--30% annual growth in the volume of information produced--shows no sign of slowing.
"Google or other search engines give you a list of pages of links--hopefully, you find your answer in the first 10 because you're not likely to find it in the other hundred pages," says Brian Lent, president and chief technology officer of mobile search startup Medio Systems Inc. and a consultant to Silicon Valley venture-capital firm Mohr, Davidow Ventures. Typing simple searches can prove tricky when users are looking in their companies' computers rather than those on the Web. And structured information--the kind that sits in customer-relationship-management systems, supply-chain planning software, and financial databases--makes up only about a fifth of the data companies have on hand, search experts say. The rest is unstructured, residing in E-mail messages, Word documents, and PDFs.

![]()
![]()
Google's PageRank algorithm relies on more than 100 variables to decide what's relevant, Girouard says.![]()
Page 2:
![]()
1
|
2
|
3
Next Page »
Open Government: A San Francisco Treat
San Francisco took Obama's pledge of open and transparent government seriously, and launched datasf.org -- its attempt to give the city's data back to its citizens. Developers and users have embraced it, and the city's mayor is already looking ahead....

NOTE: Offer valid for U.S., U.S. possessions, & Canada only.