Data warehouses aren't just exploding in size, they're also supporting more users and increasingly complex queries, all in shorter time frames. Here's how to make sure yours is ready to scale.
LGR Telecommunications has a 310-TB Oracle data warehouse that's used daily by 2,500 people at one of its telecom carrier clients. The warehouse powers an LGR service, called CDRlive, that gives its carrier customers access to call data records. It's updated round the clock, in near-real time, and is available for query 24 hours a day, 365 days a year.
"There are no batch jobs," says Hannes van Rooyen, chief architect at LGR, which supplies data warehouse software and services to the telecom industry. "Instead, as many as 13 billion records a day are added, and an equal number are dropped in an online update process that runs concurrently with user queries."
The data warehouse keeps more than a petabyte of disks spinning and has grown by a factor of 10 during the last four years. It's expected to at least double in the coming year.
Most companies still don't hold hundreds of terabytes of data, but they're up against the same data warehouse problems that face LGR--soaring data volume, more users, complicated queries, and fast-changing information. Throw in a growing number of vendor options and it's time for companies to re-evaluate their data warehouse strategies.
COMPLY OR DIE
IT groups rethink the "save everything" approach to data management.
The new generation of data warehouses looks a lot like LGR's: growing at an extraordinary pace, in multiple dimensions, and supporting critical business processes that must react quickly to events around the company. Whether your company has 250 GB or 250 TB of data, you're likely facing the same questions: Do we have the right architecture? Is it on the right platform? Is the warehouse about to run out of headroom? What will it take to service new users? How do we move from batch loading to continuous update? And with technology changing so rapidly, how do we know we're on the right system?
All the answers loop back to managing scalability. Getting control of scalability might mean embracing the highly parallel processing and scale-out architectures long offered by Teradata and IBM and elements of which are now emerging in new products from Oracle and Microsoft (see story, "Microsoft And Oracle Are Scaling Out"). Or it might just require more effective management of existing data warehouse practices, including quantifying requirements, measuring alternative solutions, and acting earlier on potential problems.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.