New York Stock Exchange Ticks on Data Warehouse Appliances
Netezza deployment replaces mega data warehouses while cutting query times from hours to seconds.
It's not a typical enterprise data warehouses story, but then NYSE Euronext (NYSE), the parent company of the New York Stock Exchange, is not a typical enterprise. For one thing, NYSE has not one but three warehouses, each approaching 100 terabytes. Then consider NYSE's queries, some of which interrogate more than 40 terabytes of data. The extreme data volumes and extreme query complexity led to an upgrade onto data warehouse appliances.
After a period of rampant growth and mergers with two smaller exchanges, NYSE knew its large and aging Oracle data warehouses needed replacement. After exploring alternatives in 2006, the company concluded a successful 45-day proof-of-concept project on a Netezza Performance Server (NPS) appliance in early 2007. The main warehouse for the New York Stock Exchange was migrated within two and a half months and went into production in May 2007. A second device, consolidating what had been two separate warehouses for the Chicago-based Arca Equities and Options markets, went into production in July. Yet another warehouse, one housing legacy data, will be migrated onto a third Netezza NPS.
The NYSE and Arca warehouses primarily support market surveillance, monitoring trade patterns and behaviors to ensure compliance with the rules of the exchanges, and these queries can be quite complex. "It's very possible that we could hit 40 to 50 terabytes of data in a single query," explains Steve Hirsch, chief data officer.
One of NYSE's more complex queries took 26 hours on the old platform, but it now takes just two and a half minutes on NPS, says Hirsch. The warehouses also support simpler queries for pricing analysis, load and capacity planning, and analysis of behavior across trade types and product types. In one example, a simple (though still high-volume) query that took seven minutes on the old platform now takes five seconds.
While Hirsch describes both the New York Stock Exchange and Arca warehouses as "enterprise data warehouses," with complete, unaggregated data, it should be noted that they're query-volume and user-volume demands don't compare with many large warehouse deployments. The number of queries per day tops out at "hundreds per day" (versus thousands on even tens of thousands on a high-demand EDW) and the user community tops out at 150 per appliance, with only 20 concurrent users on each device at any one time.
"We're not massive in terms of our user base, but we're pretty big in terms of the types of analytics and the time span of data that a single query might hit," says Hirsch.
Competitors and some analysts say Netezza's NPS is not well suited to mixed workloads and large user communities, but Netezza counters that one of its largest customers, Michael's Stores, has 600 users and 22,000 queries (the user figure is by no means extreme, while query complexity and frequency is more telling than the sheer number).
Between the two appliances now in production, NYSE captures and stores about one terabyte of new trade data per day on the warehouses, much of it through near-real-time (minute-to-minute) trickle feeding. Having two separate appliances offered advantages in loading performance. Despite rapid data growth rates in access of 100 percent per year at NYSE, Hirsch says the company is well prepared to scale up. NYSE was a beta customer for Netezza's recently announced Compress Engine software, a firmware upgrade that effectively doubled the capacity of existing NPS devices in the field. Hirsch confirms that the upgrade has improved rather than degraded performance, as is sometimes the case with compression technologies.
"The main benefit was 2.2 to 2.4 times compression, which has more than doubled our capacity, but load performance has doubled and query times have also improved," he says. "Non-computational queries — those that simply require access to day — have doubled in speed, and we're seeing improvements of 10 percent to 30 percent for queries that are CPU-intensive."
Hirsch says he can't detail future scale-up plans, citing a non-disclosure agreement with Netezza, but he concludes that "disk drives are getting bigger and Netezza's Compress Engine only works on numerics today, so there are lots of opportunity for future upgrades."
The Agile ArchiveWhen it comes to managing data, don’t look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyIT’s tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.
Join InformationWeek’s Lorna Garey and Mike Healey, president of Yeoman Technology Group, an engineering and research firm focused on maximizing technology investments, to discuss the right way to go digital.