Welcome Guest. | Log In| Register | Membership Benefits
News

May 8, 2000

Printer ready
Printer ready

Now Comes The Hard Part: Analyzing The Data

By Rick Whiting and John Foley

Illustration by Robert Nuebecker Collecting clickstream data and storing it is, in some respects, the easy part of Web database management. What's much harder is analyzing that data and acting on it in the real-time environment of online commerce.

Consider the problem faced by Engage Technologies Inc., an online marketing company. Engage tracks Web users as they move within and among sites, and serves up advertisements based on anonymous profiles it has established. Its Engage Knowledge database has 52 million profiles, with 800 interest categories, each based on a sliding scale. When the data is summarized for fast response, it's organized in hundreds of millions of "rows," one of the basic building blocks of a relational database. When stored in its more basic form for analysis--one row per attribute--the result is billions of rows, a monumental data-management headache.

"One of the hardest problems we deal with is predicting how many ads are going to appear on each Web page, or section of a page, on thousands of Web sites," says Daniel Jaye, chief technology officer at the Andover, Mass., company. "You may be optimizing ads on an individual-by-individual basis for millions of people." On top of that, there are restrictions on how many times specific ads should be displayed on the same browser within a given time frame.

To do it, Engage has hired "some of the best minds in data mining and statistics," Jaye says. It's also using a variety of database products, including software from Hyperion, Informix, and Oracle, and looking at others that Jaye declines to reveal.

As companies take on the performance and scalability issues involved in managing huge amounts of data and responding quickly to user activity, Jaye says, special-purpose database tools can be a competitive advantage. Travelocity.com LP and MatchLogic Inc. both cite NCR's TeraMiner middleware as a factor in their recent decisions to implement NCR's Teradata database-management systems for new data warehouses, while using Oracle databases for operational chores. TeraMiner speeds up queries by processing data inside the database before exporting it to analysis tools.

Later this year, NCR will release TeraMiner Analytics, technology that accelerates queries even more by performing analytical calculations directly within the database. Mainstream database vendors are on the same path. IBM, Microsoft, and Oracle are in various stages of building data-mining capabilities directly into their database software to help process analytical queries faster. Oracle and IBM's DB2 already include functionality--called "materialized views" by Oracle and "automatic summary tables" by IBM--that provides users with summarized views of data in huge databases.

But the established database vendors aren't the only ones providing technology for speeding up queries and data processing. QueryObject Systems Corp.'s namesake software takes huge volumes of transaction data and compresses it into a compact data mart for easier analysis. Dynamic Information Systems Corp. offers its Omnidex database search engine and indexing technology for speeding up ad hoc queries and multidimensional analysis.

Return to main story, "Web Data Piles Up."

Illustration by Robert Nuebecker

Back to This Week's Issue
Send Us Your Feedback
Top of the Page

CAREER CENTER
Ready to take that job and shove it?



TechCareers

SEARCH
Function:

Keyword(s):

State:
SPONSOR
RECENT JOB POSTINGS
CAREER NEWS
Go beyond Google and get vertical. These specialized search sites will help you find the business information you need -- fast.

Ari Balogh was named to the post of chief technology officer as the companys for a "realignment" of employees.



Specialty Resources

Featured Microsite