Announced November 17, ParAccel Database 3.0 (PADB 3.0) will mark the vendor's first stab at in-database processing for analytics that aren't expressed in standard SQL. Integration will be achieved through new table functions that can feed and receive results to and from third-party and custom algorithms written in languages such as C and C++.
To get a running start in new analytic areas, ParAccel has partnered with Fuzzy Logix, a vendor that offers an extensive library of descriptive statistics, Monte Carlo simulations and pattern-recognition functions. The table functions approach also will support MapReduce techniques, more than 700 analyses commonly used by financial services, and custom algorithms, according to ParAccel.
"Whether it's fuzzy matching, geospatial analysis or credit-risk pricing, these are things that many customers have written in C or C++ that can now be integrated directly into our database," said Tarun Loomba, ParAccel's chief marketing officer.
In-database processing moves data-intensive analyses inside powerful, parallel-processing data warehousing environments. The idea is to handle the analyses where the data resides rather than pulling massive data sets out of the warehouse and processing on the comparatively slow servers that typically run analytic packages and non-SQL algorithms.
The difference in performance between conventional and in-database approaches is usually dramatic. Queries that would otherwise take days might take minutes while those that formerly took hours might take seconds.
Teradata, Netezza (now a part of IBM) and Aster Data have led the in-database trend in recent years. All three of these vendors have partnered with SAS to bring that leading analytics vendor's procedures into their databases. ParAccel has yet to win a partnership with SAS, but that's not a relationship that's quickly or easily won. Data warehousing vendors usually start with smaller analytics vendors such as Fuzzy Logix, a company that already has partnerships in place with Aster Data, Microsoft, Netezza and Sybase.
ParAccel is also catching up on connections to alternative databases. Lots of data warehousing vendors have struck up partnerships with Cloudera to integrate with its distribution of open-source Hadoop. The parallel processing environment supports analyses of inconsistent data and loosely structured content such as e-mail messages and social network comments. ParAccel plans to add an integration with Hadoop, according to Loomba, but he did not specify whether it would be in partnership with Cloudera.
Other improvements promised in PADB 3.0 include workload management features that favor short queries over batch jobs. This will give interactive BI users short response times even if big jobs are running in the background. The upgrade will also improve uptime by eliminating restarts in the event of disk failures.
Set for release in December, PADB 3.0 will start at $50,000 per CPU. The database runs on suggested configurations of hardware from leading vendors including HP and Dell.