Exploring Netezza's Coming 6.0 DBMS - InformationWeek
Software // Information Management
09:35 AM
Curt Monash
Curt Monash
How Upwork Cut Zero-Day File Attacks by 70%
Oct 05, 2017
Upwork has millions of clients and freelancers that have to upload and download many files to and ...Read More>>

Exploring Netezza's Coming 6.0 DBMS

Netezza is having its user conference, Enzee Universe, in Boston Monday-Wednesday, June 21-23, and naturally will be announcing new products there, and otherwise providing hooks and inducements to get itself written about.

Netezza is having its user conference, Enzee Universe, in Boston Monday-Wednesday, June 21-23, and naturally will be announcing new products there, and otherwise providing hooks and inducements to get itself written about. (The preliminary count is seven press releases in all.) To get a head start, I stopped by Netezza last Thursday for meetings that included a three-and-a-half hour session with 10 or so senior engineers, and have exchanged some clarifying emails since. It might be best to start with some Netezza product introduction and naming housekeeping:

  • Netezza isn't changing the hardware on any of its existing systems at this time. Rather, Netezza's product upgrades are contained in a software-only release...
  • ...except that it isn't a "release," but rather a "wave." There are three points to that terminological distinction:
    • The advanced analytics part doesn't depend on the new database platform software.
    • Individual functions in the advanced analytics part don't necessarily depend on advances in the analytics platform.
    • It plays on the surfboard-centric naming of Netezza's appliances.
  • Netezza has wisely scrapped its prior plan to make its advanced-analytics capabilities be a chargeable add-on to it core appliance products. Rather, Netezza is going to offer advanced analytics as part of its core product. Part of the reason is that the interest in these capabilities is broader than Netezza first anticipated. The name for this is is something like i-Class advanced analytics capabilities.
  • There is a "release" in all this too, namely NPS 6.0 (Netezza Performance Software). That's the core DBMS technology.
  • It's all to be shipped in Q3.
Highlights of our NPS 6.0 conversation include:
  • As promised, Netezza has improved its compression significantly. Because this was anticipated, this upgrade was planned for in the design of the systems Netezza started introducing last summer. Consequently, the reduction in I/O produced by compression translates almost directly into better performance -- the silicon is now more fully loaded than it was before, but few if any actual silicon bottlenecks have been introduced by the I/O improvement.
  • Netezza's other big performance enhancement is the introduction of clustered base tables, which it says can reduce I/O by an order of magnitude or better.
  • Netezza says that there are individual queries in which the enhancements take query performance up 30-40X. (Presumably, those would be ones for which clustered base tables are a big win.)
  • More interestingly, Netezza says that overall performance is improved by >2X. That's queries, load, backup, and everything else all blended together.
  • Underpinning all this, Netezza went from 125 MHz to a blend of 125 and 250 MHz in its FPGA clock speeds. Also, the width of the FPGA onboard data path went from 16 to 32 bits. Netezza suggests that the naive calculation which says this could increase FPGA throughput 4X isn't entirely misleading.
  • Netezza is pretty content with its workload management capabilities for queries, but nonetheless keeps adding features. Workload management has not yet been extended to cover all the non-query parts of the analytic functionality.
  • Netezza continues to enhance its cost-based optimizer and query planner.
  • Netezza has long used an internal networking approach that's rather different from TCP/IP. Netezza views TCP/IP's strength as recovering gracefully if there's congestion. However, Netezza would rather do whatever it takes to preclude congestion in the first place, except perhaps in rare edge cases. I'm not aware of what enhancements, if any, have been made to Netezza's internal networking specifically in NPS 6.0.
The basic idea of clustered base tables ("base tables" are ones that are not, for example, materialized views) is to range partition in multiple dimensions at once. Then you rule out (as in don't retrieve) all those blocks that fail a match in any one of the cluster dimensions. Netezza says its customers were doing a lot of work to simulate this benefit by multiple sorts; Netezza's implementation will now handle that much more automatically. Netezza says that talking to customers revealed that 4-5 cluster dimensions was almost always the most somebody would need; they will ship support for 4. That makes sense. In most cases, you'd want to cluster on the answers to "W" questions -- Who, What, Where, When (but probably not Why), in one dimension each. However, Netezza does call out as an ideal use case geospatial, precisely because 2 (or more rarely 3) dimensions each have "equal weight."

I don't know how other vendors implement clustered base tables, but in Netezza's case it's via a space-filling curve. (Actually, they called it a "Hilbert space-filling curve," but I oppose that phrasing, as it's apt to lead to extremely incorrect use of the term "Hilbert space.") I.e., data is mapped to 4-tuples (say) in line with the dimensions, which are then sorted in a linear order in line with a space-filling curve. Happily, Netezza hasn't experienced problems clustering columns that have particularly challenging cardinality or skew.

If I understood correctly, you can only zone map (and presumably cluster) on integers and dates right now, but that will change soon. (Edit: In blog comments and email, Tim Greenwood of Netezza explained to me that the NPS 6.0 workarounds to that were much more robust than I realized.)

Netezza put a lot of work for NPS 6 into something it calls "table grooming," which amounts to recopying tables in more beneficial form. Uses for table grooming -- which is a manually initiated process -- include but probably aren't limited to:

  • Clustering tables and, as needed, reclustering them.
  • Getting rid of data that was deleted. (Netezza has Postgres-style multiversion concurrency control -- MVCC -- but no time-travel, so keeping around deleted data is a waste of space.)
  • Recompressing data from Compress Engine 1 to Compress Engine 2.
  • Alter Table
The core ideas of table grooming include:
  • The Netezza NPS software copies rows from one place to another.
  • Netezza NPS then updates the appropriate metadata.
  • Metadata updates are transactional, even though the actual data movement is not.
This can be done part of a table at a time. Reads and loads are unaffected by the process, or at least not blocked. Delete commits are indeed blocked during a reorg, but Netezza guesses that the block hold for a few minutes during the grooming of a clustered base table, 10-15 seconds if space is being reclaimed, and something similar for an Alter Table.

And finally, here are some notes on Netezza's query optimization and planning.

  • Netezza has a traditional cost-based optimizer, in which all operations have estimated costs, measured in microseconds, irrespective of which parts of the system (CPU, I/O, network, whatever) they most stress. (I have trouble imagining how a cost-based optimizer could work differently from that without incurring huge computational costs.)
  • Netezza's bottleneck is almost always disk I/O.
  • Netezza's optimizer is not/no longer based on the PostgreSQL optimizer.
  • Netezza does a lot of query transformation. Key points include:
    • Netezza joins are usually very cheap.
    • Filtered scans are cheap too.
    • More expensive in Netezza are data redistribution (duh), sorts, and unfiltered scans.
    • Most expensive of all are intermediate result sets that don't fit into memory.
  • Specific examples of Netezza query transformation include:
    • Pushing predicates out to nodes.'
    • Flattening query trees and eliminating subqueries.
    • Rewriting windowed aggregates to be joins + grouped aggregates.
    • (New in 6.0) Transforming outer joins into other kinds.
  • Netezza does real-time sampling to help with query planning. (But this is only worth doing for queries that are estimated to be expensive.) Zone maps (and clustering too?) are invoked as part of deciding where to sample. Sampling was for scans only prior to NPS 6.0, and will now be done for joins as well.

Related links

Newest First  |  Oldest First  |  Threaded View
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of IT Report
In today's technology-driven world, "innovation" has become a basic expectation. IT leaders are tasked with making technical magic, improving customer experience, and boosting the bottom line -- yet often without any increase to the IT budget. How are organizations striking the balance between new initiatives and cost control? Download our report to learn about the biggest challenges and how savvy IT executives are overcoming them.
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll