I talked to Jim Duffy, BNP Paribas' data warehouse architect, and he went into great detail about the deployment. Here are a few highlights:
- BNP Paribas is using a half-rack deployment of Exadata V2 to replace a four-node RAC cluster running Oracle 10g. The deployment went live two weeks ago, and like the system it replaces, it serves as the underpinning of some 35 trading-floor applications. That includes near-real-time performance stat monitoring, risk- and market-abuse compliance reporting, network and internal application optimization, and long-term (five-year) archival compliance querying and reporting. A second Exadata V2 half rack is dedicated to disaster recovery.
- Exadata V2's Smart Flash Cache -- 5 terabytes' worth in BNP Paribas' case -- supports the high end of the company's high/medium/low-speed data-access scheme. The last week's worth of trading data is stored in flash cache, as is anything directly accessed by an internal-facing Web-site used for monitoring and querying (particularly derived data).
Cache is also used for staging tables that are accessed directly to quickly deliver internal application performance statistics. Duffy said the monitoring/querying Web site is serving up answers five times faster than it did with the RAC/10g deployment.
- The total warehouse stores less than 10 terabytes, down from 23 terabytes in the previous environment. Duffy credits Exadata V2's hybrid columnar compression with cutting the database down to size (and reducing storage and admin demands in the process). "I can get tables that were already compressed down to 1 terabyte in Oracle 10g down to 185 gigabytes using the hybrid columnar compression in Exadata," Duffy said. "That brings massive benefits in terms of manageability."
- Elaborating on admin and management, Duffy said he had to tune the old RAC implementation "six ways to Sunday" to maintain performance while coping with ever-changing database loads. With Exadata V2, the peaks are easy to handle. Maintenance tasks that used to take hours can be done in minutes on Exadata V2, and data modeling steps that used to take more than two hours now take less than 20 minutes, Duffy said. "More than half the time we spent tuning vanishes, and all that time can be immediately dedicated to development," he said.
- The deployment does not address the investment research side of the business -- the part that analyzes market opportunities, trades and risks (other than those related to compliance).
- Existing queries are running about 16 to 17 times faster on Exadata V2 than in the old RAC deployment, on average. Duffy said he hasn't had time to change any code to further exploit Exadata V2's ability to push SQL querying down into the storage tier, which could improve performance on queries accessing lots of data.
Image Gallery: 11 Leading Data Warehousing Appliances
|(click for larger image and for full photo gallery)|
BNP Paribas isn't the largest deployment I'll cover in my upcoming Big Data story (that title goes to Catalina Marketing with 2.5 petabytes in Netezza). But as Duffy points out, Exadata V2's compression capabilities have cut a 23-terabyte database down to less than half that size.
Ironically, size isn't necessarily the best gauge of database "scalability." Column-store database vendors such as Sybase, Vertica and ParAccel will tell you they have to educate some customers about the deceivingly small size of their deployments. If your DBMS can compress a 100-terabyte store down to 10 terabytes, as these vendors say they can do, you won't win a my-database-is-bigger-than-yours contest. But you will save on storage costs and more.
As far as performance gains go, I've heard some very impressive reports in recent weeks. Catalina Marketing says it has cut model-scoring times from 4.5 hours down to 60 seconds by taking SAS code into the Netezza database. Cabela's says it's now handing SAS queries inside Teradata in about hour that used to take four days to prepare and run with a conventional data warehouse and separate SAS data sets.
You can't make apples-to-apples comparisons between these results and the 16X-to-17X average improvement that BNP Paribas reports. And, as Duffy says, he has hardly had time to learn how to get the most out of Exadata.
Oracle customers take heart. Real-world deployment references for Exadata V2 are emerging. Your best bet is to read about reference customers, find those that are most similar to your operation and see if you can talk to the people, like Jim Duffy, who manage those deployments.
Don't stop there. When you're a serious buyer looking to drop six, seven or even eight figures, you have every right to ask for a pilot test with your sample data and queries. Not every vendor will oblige, but then, you don't have to consider every vendor.
Ten years ago it was pretty much a three-horse race when it came to large-scale data warehousing -- Oracle, IBM, Teradata. Microsoft is about to enter that ring, and today there are at least six other credible vendors as well as emerging open-source options. Viva la choice.I've been encouraging Oracle to serve up Exadata reference customers instead of sales pipeline claims. The company has delivered, setting up an interview with Paris-based financial services giant BNP Paribas. Jim Duffy, the company's data warehouse architect, went into great detail about the deployment. Here are a few highlights...