InformationWeek: The Business Value of Technology

InformationWeek: The Business Value of Technology
InformationWeek - Our New iPad App
News

May 8, 2000

Printer ready
Printer ready

Web Data Piles Up

continued...page 3 of 3

Illustration by Robert Nuebecker
Related links:

  • sidebar: Now Comes The Hard Part: Analyzing The Data

  • Databases Get Boost From Internet And E-Commerce (2/14/00)
  • And from our sister publications:

  • Computer Reseller News Tackles Integration Of Databases, Internet (4/24/00)

  • InternetWeek Legato Back-up Software Now Recovers Databases (2/14/00)

  • TechWeb ASP Begins Hosting Low-End Databases (2/4/00)
  • TechEncyclopedia
    Need a definition of a technology term? Look it up here:


    Send Us Your Feedback
    Backing up data, once a fairly straightforward chore, also becomes problematic with Web databases. Consultant Winter says that VLDBs require data storage systems with automated backup capabilities. It's a high priority for DoubleClick's Linsky, given that the collected data is essentially the company's principal product. DoubleClick conducts incremental backups on an ongoing basis, Linsky says, plus full weekly backups. The company uses tools from Legato Systems Inc. to back up the log files from its Windows NT-based advertisement servers and Oracle's replication technology to backup the database itself.

    Web database administrators are also wrestling with the question of how data should be archived--and how much. Determining what clickstream and transaction data needs to be kept online for internal analysts and external customers and what can be stored in slower archival systems is more an art than a science. Oracle's Howard says many of the company's customers use what he calls a "rolling window operation" approach, always keeping a window of "live" data--such as 13 months worth for seasonal purposes--and then archiving older data.

    But usage, rather than age, should be the gauge for judging when data should be archived, says Claudia Imhoff, a senior VP at Braun Consulting and an expert on databases and data warehouses. She advocates using tape or optical disk systems as an alternative to expensive disks for archiving data. IT managers "need to be analyzing what data is being used and how it's being used," she says. That means using a data usage-monitoring tool, she says, such as those sold by vendors such as Pine Cone Systems Inc. and Teleran Technologies LP.

    For many companies, however, developing data-archiving plans remains a relatively low priority. "We'll get to an archiving strategy one day," says DoubleClick's Linsky, citing the end of the year as a goal.

    Adding to the management complexities is the changing nature of how Web databases are being used. In the past, data warehouses were used primarily for offline analysis, and were far removed from operational IT systems. If they went down for an hour--or even a day or two--the world wouldn't end. With the rise of online sales and marketing, data warehouses are tied directly into E-commerce systems because companies want near-instantaneous feedback on operations, and because they can support personalization and cross-selling applications. "The data warehouse is no longer used primarily for deep analysis," Oracle's Howard says. "The data warehouse has become a production system unto itself."

    Even for the few companies that are experienced at managing large Web databases, the task doesn't seem to get easier. Web-site infrastructure "is incredibly important and becoming more so," Ron Sege, executive VP with Lycos Inc., said in a recent public presentation. Lycos, a Web portal, has profiles on 42 million registered users, who generate searches on up to 30 terabytes of information each day. "It's not trivial," he said dryly.

    Bob LinskyPhoto by Edward Santalone Travelocity and MatchLogic both recently added NCR's Teradata platform to their IT operations when Oracle databases began to bog down under the load. Travelocity has been using Oracle databases for both its transactional systems and its data warehouse, with data moved from the former to the latter on a daily basis. The data warehouse was built with "indexes," predefined queries that are designed to boost system performance.

    But that approach wasn't working because Travelocity's marketing analysts bombarded the data warehouse with a wide range of ad hoc queries that rendered the indexing scheme nearly useless. The ability to ask those previously undefined questions is key. "The people who are getting the real value out of data warehouses are asking ad hoc questions," NCR's Farrell says.

    Teradata's parallel-processing architecture can rapidly process queries without the need for indexes. "That allows you to ask the right questions and beat the competition," Jones says. Another benefit was that Travelocity's data warehouse actually shrank when converted to Teradata because of the reduced use of summary tables. Travelocity runs its operational systems on servers from SGI Inc. and Sun Microsystems, while the Teradata data warehouse will run on NCR's Worldmark server.

    Like Travelocity, MatchLogic's problem was how to maintain the performance of its data warehouses under an increasing number of simultaneous queries from clients and employees. MatchLogic has been using Oracle databases for both its operational and data-warehouse systems. Data is extracted from the Oracle database and prepared for analysis using tools from SAS Institute Inc. That approach works fine when, say, five or fewer people are trying to query the database simultaneously, Garzella says, but the system runs out of steam as the number of simultaneous queries hits 10 or 15. MatchLogic needed to boost that capacity to between 50 and 100 simultaneous users. "We're finding that for some clickstream data warehouses, we're running out of juice," Garzella says.

    Using the Teradata system, MatchLogic will consolidate the number of data warehouses it operates from 15 to seven or eight. Garzella expects to get a performance boost from the parallel-processing capabilities of the data warehouse software's core engine. But equally important is the system's TeraMiner technology, which prepares data for queries by pre-processing the data while it's still within the data warehouse. "We're looking at improvements of 15 to 30 times in our clickstream query speeds," Garzella says. The Oracle databases run on Sun 4500 and 6500 servers, while the Teradata system will run on NCR's 5200 Series servers.

    One of the early front-runners in using a Web database of customer information for strategic advantage was Peapod Inc., the online grocery-delivery service. The company built a proprietary database engine capable of recognizing repeat customers and offering discounts and promotions based on a customer's profile. Peapod had ambitious plans to exploit its database, but it struggled to turn a profit, and as cash reserves ran dangerously low in March, the company's CEO resigned and its stock plunged. Since then, the Dutch supermarket group Koninklijke Ahold NV has agreed to buy 51% of Peapod for $73 million.

    As that example shows, Web databases aren't enough to protect an E-business from all that might go wrong. What's more, they're expensive and complex. But Web databases keep growing in size, and as they do, they're taking on greater importance at the companies building them. "They're the heart of Engage's value," says CTO Jaye. That's a sentiment sure to be repeated at more and more companies.

    --with additional reporting by John Foley

    return to page 1, 2

    Illustration by Robert Nuebecker
    Photo of Linsky by Edward Santalone

    Back to This Week's Issue
    Send Us Your Feedback
    Top of the Page

    Get InformationWeek Daily

    Don't miss each day's hottest technology news, sent directly to your inbox, including occasional breaking news alerts.

    Sign up for the InformationWeek Daily email newsletter

    *Required field

    Privacy Statement



    This Week's Issue

    Technology Whitepapers

    Featured Reports







    Video