Stock Exchange Taps Microsoft's Data Warehouse Appliance
Exclusive: SQL Server 2008 Parallel Data Warehouse wins its first customer as Direct Edge will upgrade to scale into the hundreds of terabytes.
(click image for larger view)
Slideshow: 8 Big Data Deployments In Detail
Direct Edge, an upstart stock exchange that routinely trades as many as a billion shares a day, will announce next week that it has chosen Microsoft's SQL Server 2008 R2 Parallel Data Warehouse (PDW) as its platform for reporting and analytics. It's the first customer win for PDW, which is Microsoft's answer for the fast-growing big-data analysis market.
Direct Edge is not well known, but the 13-year-old, Jersey City, N.J.-based firm converted from an electronic communications network (ECN) into a regulated stock exchange in 2010. It ranks fourth among U.S.-based exchanges after the NYSE Euronext, NASDAQ OMX, and BATS exchanges and claims to handle about 10% of trading in U.S. equities (compared to about 30% for NYSE and 20% for NASDAQ).
The move from ECN to public stock exchange has brought rapid growth, with trading now generating about 2 terabytes of new data per month. Direct Edge has a conventional Microsoft SQL Server 2008 data warehouse built on clustered, high-end servers, "but we realized that we needed a platform that would scale to hundreds of terabytes," said Direct Edge chief technology officer Richard Hochron in an exclusive interview with InformationWeek.
Instead of scaling up on ever larger and more expensive proprietary servers, the switch to PDW will enable Direct Edge to scale out on commodity x86 Intel servers using massively parallel processing--an approach now common to most data warehousing appliances.
Hochron described PDW as "an obvious choice" because Direct Edge is a Microsoft shop from stem to stern. The company's trading platform runs on Windows Server 2008 and all current business intelligence and data warehousing assets at the company revolve around Microsoft SQL Server. For instance, there are some 200 finance, strategy, compliance, legal, and regulatory reports built on Microsoft SQL Server Reporting Services, and power users exploit SQL Server Analysis Services for cube-based data exploration.
The 50-employee Direct-Edge IT team had a comfort level with Microsoft products and development and, as a Hewlett-Packard customer, the company also liked the fact that HP is the primary hardware partner for the appliance. These were points in PDW's favor, but it was not a shoe-in, as Direct Edge insisted on a proof-of-concept (POC) project using the company's data and "proving a few specific points that were very important to us," Hochron said.
Direct Edge wanted to see proof of linear scalability, whereby the company could add storage and processing capacity without losing performance. Fast loading was also important, and POC tests came in at 950 gigabytes per hour, just shy of Microsoft's touted top speed of 1 terabyte per hour. Data compression results were about 3.2 times on Direct Edge data, versus Microsoft's max claim of 3.5 times.
(Most row-store databases, including those from Teradata, IBM Netezza, and EMC Greenplum, now achieve 2-times to 4-times compression. Oracle Exadata has a hybrid compression scheme shown to deliver 10-times compression. Full column-store products, like Sybase IQ and HP Vertica, achieve even higher compression rates.)
Direct Edge tested a battery of its most complex queries with 12 concurrent users and found that they ran about 150 times faster than on the exchange's current conventional SQL Server 2008 deployment. All these test results met or exceeded projected needs, according to Hochron, who said the exchange expects to exceed 200 terabytes and more than 40 concurrent users within the next few years. The current data warehouse has somewhere between 30 and 40 terabytes of data, he said.
The combination of Analysis Services and Microsoft's in-memory PowerPivot feature will enable power users to explore PDW directly without resorting to aggregation and summaries. Finance analysts, for example, will be able to compare customer trading activities from month to month to better understand fluctuations in trading fees, Hochron said.
Given their huge data volumes and extensive analysis, stock exchanges are big users of data warehousing appliances. NYSE Euronext has both IBM Netezza and EMC Greenplum deployments. The Netezza deployment dates from 2007 and topped 100 terabytes when it was deployed. NASDAQ selected Greenplum in 2009, before the database vendor was acquired by EMC.
Working with its integrator, BI Voyage, Direct Edge did explore other data warehousing platforms, but Hochron declined to name the other products considered and said no other POCs were conducted. Direct Edge purchased PDW in May and Hochron said he expects to switch over to the new warehouse by the end of the year.
That's a long deployment by today's appliance standards, but the executive said Direct Edge has a bit of data consolidation and integration work to do, adding new customer service, finance, and market data sources that aren't included in the current data warehouse.
Direct Edge declined to detail the exact cost or configuration of its appliance, but Hochron said it was "somewhat larger than a standard rack." When PDW was launched, software-only costs per rack were estimated at about $840,000 (based on a list price of $38,255 per processor).
Few appliance customers pay list prices, and given Direct Edge's big investments in Microsoft software, it's a sure thing it won steep discounts.
In this new Tech Center report, we profile five database breaches--and extract the lessons to be learned from each. Plus: A rundown of six technologies to reduce your risk. Download it here (registration required).
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.