InformationWeek: The Business Value of Technology

InformationWeek: The Business Value of Technology
InformationWeek - Our New iPad App
News

May 8, 2000

Printer ready
Printer ready

Web Data Piles Up

continued...page 2 of 3

Illustration by Robert Nuebecker
Related links:

  • sidebar: Now Comes The Hard Part: Analyzing The Data

  • Databases Get Boost From Internet And E-Commerce (2/14/00)
  • And from our sister publications:

  • Computer Reseller News Tackles Integration Of Databases, Internet (4/24/00)

  • InternetWeek Legato Back-up Software Now Recovers Databases (2/14/00)

  • TechWeb ASP Begins Hosting Low-End Databases (2/4/00)
  • TechEncyclopedia
    Need a definition of a technology term? Look it up here:


    Send Us Your Feedback
    "We certainly haven't found a one-size-fits-all database," CTO Jaye says. Engage is looking at special-purpose database technologies that get around some of the limitations of relational databases. Main-memory databases, for example, process data in a computer system's memory disk space, while multidimensional databases organize data into interrelated hierarchies. Both are fast at problem solving. "We're at the point now of seeing where we can apply some of these novel solutions to certain high-value business problems."

    Managing the rate of database growth is a challenge, especially at companies such as DoubleClick Inc. that are constantly expanding their services. DoubleClick, which sells and manages online advertisements for thousands of Web sites, collects 250 Gbytes of clickstream data from its 500-plus advertisement servers and 125 media servers every day. That's up from about 30 Gbytes per day just a year ago. "We figure that by the end of this year we'll be pulling in about a terabyte of data every day," says Bob Linsky, VP of MIS and operations.

    "The real challenge, as the applications grow quickly, is being sure that the database's table structures are maintained and don't grow exponentially," Linsky says. That means being vigilant when it comes to adding new naming conventions and element definitions to the database. For example, as DoubleClick adds services, the tendency among the company's 175 application developers is to create new definitions of "customer" for those applications, rather than use a definition already built into the system. That can result in multiple definitions of "customer" in the database, which causes it to grow exponentially. "Reuse rather than re-create" is Linsky's motto.

    How to add new information to a database without compromising the integrity of the existing data is a problem all database administrators face. But it's particularly thorny with VLDBs, given their complexity. WinWin.com's Motto says the trick is to design the database with as flexible an architecture as possible. "What we tried to do was anticipate that expansion strategy to the extremes," he says. The database has one main table; a database with lots of small tables is easier to design, but it also increases complexity and slows performance. In addition, stored procedures and triggers were designed to handle a greater range of variables, rather than specific queries, making the database better able to handle different queries.

    As databases expand, E-businesses are also wrestling with the question of how much detailed, granular data they need to keep and how much can be stored in a summarized format. Unfortunately for database managers, demands are increasing for keeping every last mouse-click of clickstream and transaction data.

    Daniel JayePhoto by Shelly R. Harrison Travelocity frequently taps a database of information on its 19 million customers to develop special travel offers that are E-mailed to segments of the company's customer base. "Our goal, of course, is to convert lookers into bookers," Jones says. For example, Trans World Airlines recently began flights from Los Angeles to San Juan, Puerto Rico, Jones says, so Travelocity analysts searched the company's data warehouse to identify users who had looked at San Juan-related content on Travelocity's Web site and sent them a promotional message.

    Keeping detailed data can also be useful for resolving problems such as one Travelocity encountered last year. The company began receiving complaints from customers that airlines, cruise lines, and hotels never received their online reservations. After studying the data from individual customers, Travelocity analysts concluded that after filling out online reservation forms, some customers were closing the Web page rather than clicking the final button that sends the forms on their way. Making that discovery would have been impossible using summarized data. The company corrected the problem by making changes in the design of its Web pages.

    For one-to-one marketing applications, detailed granular data is a necessity. "When you're doing personalization, detail is extremely important," Oracle's Howard says. Other data warehouse applications such as fraud analysis also require detailed data.

    In the past, summarized data in data warehouses was sufficient for discerning broad market trends. But that's changing as data-mining tools, which require detailed data to be effective, become more popular. "You really can't do data mining with summarized data," says Vicki Farrell, assistant VP of marketing for NCR's Teradata software.

    MatchLogic Inc., which provides Web advertising and online marketing services for dot-com startups and big-name companies such as AT&T, General Motors, and Procter & Gamble, gives its clients both granular and summarized data, depending on their needs, says Jack Garzella, senior director of E-business systems. Detailed data may be needed, for example, when clients' customers ask how they got onto an E-mail marketing list or want to "opt out." That requires looking back at the data to determine how and when an individual got on the list, perhaps by registering at a Web site or making a purchase. Managing the frequency with which individuals are targeted for marketing campaigns to ensure potential customers aren't bombarded also requires detailed data.

    DoubleClick is able to avoid that effort because granular data isn't needed for the level of reporting the company provides its customers. DoubleClick's database is a manageable 300 Gbytes to 400 Gbytes, Linsky says. But he expects it to grow to about 1 terabyte by year's end. Linsky's operation also maintains a number of smaller databases--subsets of the primary database--that are used for special services DoubleClick offers to publishers and advertising agencies.

    There's no easy answer to the question of how long companies need to keep data. "There's no set rule," Garzella says. The MatchLogic executive says the length of time that companies keep detailed data may hinge on the specific industry and its sales cycles. An auto dealer might keep detailed data going back three to five years, for example, while a year's worth of data might be enough for a company that provides day-trading services.

    MatchLogic keeps every last bit of data it collects because it never knows for sure what kind of information its clients might need. MatchLogic collects clickstream data from the 1 billion-plus transactions it processes every day. Its IT systems are bulging with about 25 terabytes of data spread across 15 data warehouses. The largest warehouse is 1.7 terabytes.

    Garzella says he and his staff have begun studying the question of expiration dates for data. "Is clickstream data worth anything if it's three years old?" he wonders. "I'd like to know, because it's not cheap to store all this data."

    continued...page 3
    return to page 1

    Illustration by Robert Nuebecker
    Photo of Jaye by Shelly R. Harrison

    Back to This Week's Issue
    Send Us Your Feedback
    Top of the Page

    Get InformationWeek Daily

    Don't miss each day's hottest technology news, sent directly to your inbox, including occasional breaking news alerts.

    Sign up for the InformationWeek Daily email newsletter

    *Required field

    Privacy Statement



    This Week's Issue

    Technology Whitepapers

    Featured Reports







    Video