May 8, 2000
|
|
Now Comes The Hard Part: Analyzing The Data
By Rick Whiting and John Foley
ollecting clickstream data and storing it is, in some respects, the easy part of Web database management. What's much harder is analyzing that data and acting on it in the real-time environment of online commerce.Consider the problem faced by Engage Technologies Inc., an online marketing company. Engage tracks Web users as they move within and among sites, and serves up advertisements based on anonymous profiles it has established. Its Engage Knowledge database has 52 million profiles, with 800 interest categories, each based on a sliding scale. When the data is summarized for fast response, it's organized in hundreds of millions of "rows," one of the basic building blocks of a relational database. When stored in its more basic form for analysis--one row per attribute--the result is billions of rows, a monumental data-management headache.
"One of the hardest problems we deal with is predicting how many ads are going to appear on each Web page, or section of a page, on thousands of Web sites," says Daniel Jaye, chief technology officer at the Andover, Mass., company. "You may be optimizing ads on an individual-by-individual basis for millions of people." On top of that, there are restrictions on how many times specific ads should be displayed on the same browser within a given time frame.
To do it, Engage has hired "some of the best minds in data mining and statistics," Jaye says. It's also using a variety of database products, including software from Hyperion, Informix, and Oracle, and looking at others that Jaye declines to reveal.
As companies take on the performance and scalability issues involved in managing huge amounts of data and responding quickly to user activity, Jaye says, special-purpose database tools can be a competitive advantage. Travelocity.com LP and MatchLogic Inc. both cite NCR's TeraMiner middleware as a factor in their recent decisions to implement NCR's Teradata database-management systems for new data warehouses, while using Oracle databases for operational chores. TeraMiner speeds up queries by processing data inside the database before exporting it to analysis tools.
Later this year, NCR will release TeraMiner Analytics, technology that accelerates queries even more by performing analytical calculations directly within the database. Mainstream database vendors are on the same path. IBM, Microsoft, and Oracle are in various stages of building data-mining capabilities directly into their database software to help process analytical queries faster. Oracle and IBM's DB2 already include functionality--called "materialized views" by Oracle and "automatic summary tables" by IBM--that provides users with summarized views of data in huge databases.
But the established database vendors aren't the only ones providing technology for speeding up queries and data processing. QueryObject Systems Corp.'s namesake software takes huge volumes of transaction data and compresses it into a compact data mart for easier analysis. Dynamic Information Systems Corp. offers its Omnidex database search engine and indexing technology for speeding up ad hoc queries and multidimensional analysis.
Return to main story, "Web Data Piles Up."
Illustration by Robert Nuebecker
Back to This Week's Issue
Send Us Your Feedback
Top of the Page
Hebrew Senior Life seeking Network Analyst in Dedham, MA
True Circuits seeking Mixed-Signal IC Layout Engineer in Los Altos, CA
BP seeking Desktop Strategy and Planning Manager in Houston, TX
ITT seeking Senior Staff Engineer, Systems in Fort Wayne, IN
Agilent Technologies seeking Marketing Manager in Melbourne, AU
For more great jobs, career-related news, features and services, please visit our Career Center.