8 Big Data Deployments In Detail
What does it take to move into the era of large-scale data analytics? Data warehousing experts from Barnes & Noble, BNP Paribas, Cabelas, McAfee and more share insights on innovative big-data deployments.
![](https://eu-images.contentstack.com/v3/assets/blt69509c9116440be8/bltc0182b2356ae8eed/64b83949410a1b4c0bd7459b/IW_generic_image.png?width=700&auto=webp&quality=80&disable=upscale)
Success in the Big Data era is about more than size. It's about getting insight from these huge data sets more quickly. As explored in our recent cover story, experienced practitioners are taking advantage of in-database analytics processing, breakthrough techniques such as MapReduce and innovative, new environments such as Hadoop to handle big data volumes and new data types with speed and ease.
What does it take to move into the era of large-scale data analytics? It's almost a given that deployments headed north of 10 terabytes will feature massively parallel processing, column-store architectures, or both. But the story doesn't end there. Innovations including in-database analytics, MapReduce, Hadoop, and in-memory analysis are redefining what's possible. Adknowledge is using both Hadoop and Greenplum to analyze e-mail and digital advertising campaigns. Barnes & Noble has consolidated multiple warehouses into the Aster Data platform, and it's using MapReduce techniques to better understand cross-channel buying patterns. BNP Paribas has deployed Oracle Exadata and its flash-memory edge to stay on top of trading floor application performance and compliance. Catalina Marketing has a massive Netezza deployment including what it bills as the largest loyalty database in the World. Cabela's has mastered in-database analytics on the Teradata platform so it can make the best use of expensive statistical expertise. Hutchison 3G is delving deeper into mobile-phone-contract historical analysis while also optimizing network performance on IBM's Smart Analytic System. McAfee is pioneering sparse-data analysis on Hadoop, using Datameer tools to spot correlations among spam, malware, firewall hack and botnet computer security threats. Provisio is using ParAccel to query millions of medical records and quickly spot potential drug-trial participants in close proximity to pharmaceutical research facilities. Read on for details on the science of what's possible in the new big-data era.
Challenge: Targeting performance-based ad, e-mail and keyword-search campaigns for more than 50,000 advertisers.
Old System: Netezza
New System: Greenplum DBMS on Dell C2100 servers
Capacity: 100 terabytes
Deployed: February 2010
Notes: Also runs Hadoop both on-premises and on Amazon EC2
Adknowledge turned to Hadoop when its first-gen Netezza system reached its scalability limit. The firm is now moving most analyses into Greenplum. "We'll still use Hadoop for certain things, but it presents an extra level of complexity that may require engineers to write code to process the data," says Hoggatt.
Challenge: Develop cross-channel understanding of changing sales across stores, BN.com Web site, and Nook and smart-phone e-readers.
Old System: Oracle (nine separate warehouses)
New System: Aster Data nCluster
Capacity: "Dozens of terabytes"
Deployed: June 2010
Notes: Uses MapReduce analysis techniques supported in Aster Data nCluster
"Now that all our data is in one place, we can understand customer interactions across our entire [retail/ online/e-reader] ecosystem," says Parrish. MapReduce techniques supported in Aster Data nCluster help researchers "see trends more quickly than possible in systems only using massively parallel processing."
Challenge:French financial services giant needed to optimize application performance and ensure compliance in a fast-moving trading floor environment
Old System: Four-node Oracle 10g RAC
New System: Oracle Exadata
Capacity: Half-rack capacity not specified, but it includes 5 terabytes of flash cache
Deployed: July 2010
Notes: Exadata compression cut 23-terabyte database down to less than 10 terabytes
BNP Paribas had to tune its Oracle 10g RAC system "six ways to Sunday" to maintain performance amid every-changing workloads, Duffy says. With Exadata compression and performance gains, "more than half the time we spent tuning vanishes, and the time saved can be dedicated to development work," he says.
Challenge: SAS analyses sometimes required days of data-prep and processing; Cabela's found ways to take analyses inside Teradata's database for faster processing
Old System: DB2
New System: Teradata
Capacity: 20 terabytes
Deployed: 4 Nodes in 2005, Upgrade to 7 nodes in 2010
Notes: Cabela's converted SAS to SQL before Teradata supported SAS procedures within its database
With its move to in-database processing, Cabela's had to "retrain statisticians to think outside of standard SAS and be more SQL-based, but they were flexible and made the transition," says Wynkoop. Direct-mail analyses that used to require 7.5 statistician full-time-equivalents now require just 1.5 FTEs.
Challenge: SAS procedures took as long as days at Catalina Marketing, which says its petabyte-scale customer loyalty database is the largest in the world.
Old System: Home-grown parallel processing platform
New System: Netezza
Capacity: 2.5 petabytes in multiple databases
Deployed: 2003 with multiple upgrades since first deployment
Notes: Adopted SAS Scoring Accelerator for Netezza in early 2010
In-database processing has eliminated time-consuming steps and increased Catalina's same-staff modeling capacity tenfold from 60 models per year up to 600, says Williams. He encourages would-be customers to "look at all new technologies and vendors... to make sure you have the best product for the price point."
Challenge: Network performance optimization and customer behavior analysis were pressing priorities at this British mobile network operator
Old System: Multiple Oracle databases
New System: IBM Smart Analytic System
Capacity: 60 terabytes
Deployed: November 2009
Notes:Hutchison is planning in-database analysis using IBM SPSS analytics
By consolidating multiple silos and expanding historical data, Hutchison is gaining "better detail on subscriber segments and a clearer understanding of the life cycle of [18-month and two-year] mobile phone contracts," says Silvester. The firm is also optimizing network performance to reduce dropped calls.
Challenge: Security watchdog McAfee needed to consolidate siloed analyses of spam, malware and other attacks to understand correlations among threats.
Old System: Conventional relational databases
New System: Hadoop
Capacity: 20 terabytes
Deployed: July 2008
Notes: McAfee is using Datameer search and analytic tools for Hadoop to study sparse data (non-relational data with inconsistent columns) used in text analyses.
"New technology may be uncomfortable to people initially," says Krasser, referring to unfamiliar Hadoop development approaches, "but there are new options out there that you need to look at if you want to handle new types of analyses." McAfee uses Hadoop to analyze sparse data associated with botnet threats.
Challenge: Provisio needs to quickly scan an iTrials database with health-claim and medical-record information on more than 41 million US citizens
Old System: Microsoft SQL Server cluster
New System: ParAccel database on HP servers
Capacity: 7 terabytes before compression
Deployed: Late 2009
Notes: Column-store compression is said to have dramatically reduced database size and storage needs
A drug trial "site proximity" analysis that used to take a week or more now takes 10 minutes and can be handled customer self-service style online. "We're not only doing what we used to do much faster, we're dreaming up new services," says Harrison. Spotting disease hot spots by zip code is one possibility, he says.
A drug trial "site proximity" analysis that used to take a week or more now takes 10 minutes and can be handled customer self-service style online. "We're not only doing what we used to do much faster, we're dreaming up new services," says Harrison. Spotting disease hot spots by zip code is one possibility, he says.
Success in the Big Data era is about more than size. It's about getting insight from these huge data sets more quickly. As explored in our recent cover story, experienced practitioners are taking advantage of in-database analytics processing, breakthrough techniques such as MapReduce and innovative, new environments such as Hadoop to handle big data volumes and new data types with speed and ease.
What does it take to move into the era of large-scale data analytics? It's almost a given that deployments headed north of 10 terabytes will feature massively parallel processing, column-store architectures, or both. But the story doesn't end there. Innovations including in-database analytics, MapReduce, Hadoop, and in-memory analysis are redefining what's possible. Adknowledge is using both Hadoop and Greenplum to analyze e-mail and digital advertising campaigns. Barnes & Noble has consolidated multiple warehouses into the Aster Data platform, and it's using MapReduce techniques to better understand cross-channel buying patterns. BNP Paribas has deployed Oracle Exadata and its flash-memory edge to stay on top of trading floor application performance and compliance. Catalina Marketing has a massive Netezza deployment including what it bills as the largest loyalty database in the World. Cabela's has mastered in-database analytics on the Teradata platform so it can make the best use of expensive statistical expertise. Hutchison 3G is delving deeper into mobile-phone-contract historical analysis while also optimizing network performance on IBM's Smart Analytic System. McAfee is pioneering sparse-data analysis on Hadoop, using Datameer tools to spot correlations among spam, malware, firewall hack and botnet computer security threats. Provisio is using ParAccel to query millions of medical records and quickly spot potential drug-trial participants in close proximity to pharmaceutical research facilities. Read on for details on the science of what's possible in the new big-data era.
About the Author(s)
You May Also Like