Whatever they call their products, integrated database vendors can be divided into two categories: those that base their systems on standard x86 hardware and those that make extra tweaks. Teradata and EMC use the same hardware as general purpose computing platforms, while Netezza adds a bit of proprietary wizardry.
IBM Netezza's key innovation is to implement SQL in hardware using custom chips to speed queries. In addition, MPP schemes spread workloads across multiple blades; for example, segments of a SQL query can be processed simultaneously for maximum performance. The whole system is orchestrated by host servers that split processes across blades and manage onboard storage.
In contrast, Oracle's Exadata appliance uses no custom chips and departs from the approach used by most of its competitors in three ways: It runs Oracle 11g database code on storage servers, it adds flash caches that speed data access, and it implements proprietary compression throughout the system. By having Oracle database code on the storage servers, those servers know exactly what to access, cutting the amount of data that needs to be sent over the internal Exadata network and then analyzed. So, instead of scanning the entire database, a query for, say, customers with more than $1 million in purchases can be executed against only the specific tables and blocks of data that are relevant, saving time and speeding useful results.
Flash caching and compression sound fairly self-explanatory, but Oracle says that it uses these techs more effectively because the system is aware of the specific needs of an Oracle database. "We compress data everywhere: in flash, on disk, even in memory buffers," says Shetler. Where most competitors claim two- to four-times compression, Oracle and many of its customers report 10-times compression, which greatly improves storage capacity and reduces the total cost of the system. However, the capacity any company will achieve is application- and data-dependent.
SAP's HANA takes yet another approach, storing as much data as possible in main-system RAM to avoid the latency involved in disk writes. "The main use case is an operational database next to a warehouse," says Prakash Darji, VP and general manager of data warehouse solutions at SAP Labs. HANA instantly mirrors and lets you analyze real-time transactional information from SAP applications--the data held in the operational database. And the business can correlate this information with historical data in the warehouse to spot important trends and opportunities to take action. In this respect, HANA is radically different from Exadata, which, when used for data warehousing, is as dependent upon (typically batch-oriented) data integration from transactional systems as any conventional data warehousing platform.
SAP argues that running applications directly on HANA will improve performance by cutting out some complexity, but its claims are still unproven because SAP's in-memory technology has yet to become a platform for transaction processing, or even a replacement for the data warehouse. The current HANA product does have customers using it for analytics. For example, Adobe uses it to sift through customer data in search of unlicensed software use. British gas and electric utility Centrica uses it to process data from millions of smart meters, forecasting demand and pricing. Though HANA is designed to be an integrated appliance, SAP doesn't sell hardware itself, instead licensing HANA to partners including Cisco, Dell, Fujitsu, and HP.