The Trouble With Terabytes - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Software // Information Management
News
1/17/2006
03:16 PM
Connect Directly
LinkedIn
Google+
Twitter
RSS
E-Mail
50%
50%

The Trouble With Terabytes

The trouble with terabytes is too much money spent on hardware, software and administration.

Joshua GreenbaumIf you ever want to see what's wrong with the growing world of corporate data, check out Winter Corp.'s "2005 Top Ten Program." Winter has been publishing a survey of the biggest and baddest databases for several years, and the results are as much a triumph of database technology as they are an indictment of the more-data-is-better mentality that seems to pervade IT departments and executive suites alike.

Just because we can build huge databases, and put every bit of data on line, doesn't mean we should. One simple reason we shouldn't is that big is not necessarily better — logic that should resonate with business and IT managers.

Big often means too much money spent on hardware, software and administration, and too much time sorting through what is often a landfill's worth of data. Big can mean lousy throughput and even worse analysis. And big also can indicate a lack of understanding of the real goal of hanging on to historical data, and how sampling and other Statistics 101 techniques make it possible to analyze only the data you need, instead of all the data you have.

So every time I see a multi-terabyte database, I begin to wonder if the company that has built such a monstrosity really understands the content or value of its data. Or is it building a "just-in-case" database — a kind of cover-your-analysis solution that ought to be an order of magnitude smaller to be as useful as possible. Unfortunately, "just in case" and "CYA" seem to be the order of the day.

The Big Ones

Against this backdrop of too big, too slow and too clueless, the Winter data is amazing. Once you're done being impressed by the sheer bulk of the Top Ten, you really should think about whether your company would want to find itself in the winner's circle. In other words: Do you really want to aspire to running a 20-plus terabyte transaction database and a 100-TB data warehouse?

To be fair, there's some justification for the size of some of these monsters: Yahoo's 100-TB data warehouse may have a meaningful raison d'etre. And maybe Amazon's two mega-data warehouses, coming in at 24 TB and 19 TB respectively, make some sense too. There may even be a smart business case for the U.S. Patent and Trademark Office's 16-TB transaction database. But for the rest of us, there has to be a better way.

One of the better ways is to archive the data, though most of these solutions slow access to a crawl. Running an historical report means locating a tape, mounting it, indexing it, loading an operational data store and running the report against what you hope is the right data set. That usually takes many minutes. Of course, most archiving solutions don't really change the amount of data you're trying to store, just whether it's on line or not. Archiving can improve the throughput of your on-line data significantly, but at the cost of gumming up the analysis of your off-line data.

One archiving vendor, SAND Technology, can create what it calls a "near-line" archive that can be queried without a complex restore process. SAND's compression technology also reduces the overall data footprint by an order of magnitude. This means that SAND can solve the cost, throughput and data storage problems, thus giving archiving a much-needed image upgrade.

It's about time because these mega-terabyte databases desperately need to be put on a diet. They're too big, too costly and too inefficient — despite the prevalence of cheaper hardware and faster software. And, fundamentally, these mega-terabyte databases are evidence of a lack of strategic thinking about the most strategic asset in the company: Data.

That's the biggest problem of all. Technology tends to reward sloppy thinking and sloppy actions with trouble, and trouble with the corporate database is trouble at the heart of a company. You many not qualify for the Winter Top Ten, but if you're getting close, you may want to rethink your database strategy — before it's too late.

Joshua Greenbaum is a principal at Enterprise Applications Consulting. Write to him at [email protected].

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Slideshows
IT Careers: 10 Industries with Job Openings Right Now
Cynthia Harvey, Freelance Journalist, InformationWeek,  5/27/2020
Commentary
How 5G Rollout May Benefit Businesses More than Consumers
Joao-Pierre S. Ruth, Senior Writer,  5/21/2020
News
IT Leadership in Education: Getting Online School Right
Jessica Davis, Senior Editor, Enterprise Apps,  5/20/2020
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Key to Cloud Success: The Right Management
This IT Trend highlights some of the steps IT teams can take to keep their cloud environments running in a safe, efficient manner.
Slideshows
Flash Poll