A Partial Victory For Data Quality
Veridata makes it easy to compare source and target data in relational tables, but limited database support and functionality hamper this 1.0 release.
PROS |
---|
• Easy to use, with Web and command-line interfaces that let you configure and execute comparisons and view results. • Result records go beyond obvious mismatch errors to include processing and performance metrics such as rows processed per second, row size before and after compression, and time spent fetching rows. • Supports optimization with multirow fetches, buffer management and other techniques. CONS • Currently supports only Oracle and HP (NonStop) databases. • Can't match more than two data views and lacks more complex comparisons. • Documentation is sparse, with little troubleshooting information. |
Business continuity and recovery plans are highly dependent on data availability. Best practices for assuring availability include creating and maintaining standby data, but how do you ensure its quality and consistency? Even if you've created a disaster-tolerant architecture and simulated and planned for emergencies, poor-quality standby data can render those efforts meaningless.
Veridata 1.0 from GoldenGate Software is designed to ensure that your standby database is consistent with your production database. If you've been puzzled by unpredictable discrepancies between source and migrated or replicated data — in production or standby settings — Veridata offers a start on ensuring point-in-time consistency, but the product needs some improvements before it's a complete and versatile data-quality tool.
GoldenGate is a transactional data management (TDM) vendor best known for its namesake product for transactional data integration — the capture, transformation and delivery of in-flight data. Introduced in August, Veridata has a single objective: It compares source and target data in relational tables to ensure they're identical.
Veridata has three components: Veridata server, client agent(s) and the Web/command interface. The server processes comparisons of data served by client agents, which reside on the database servers and connect to the production and standby data sources. The server runs on Apache Tomcat, which provides open-source, multiplatform portability but requires the Java SDK (GoldenGate does not offer a Microsoft.Net alternative).
Users configure, execute and report on comparisons using a CLI (command-line interface) or the simple Web interface (see screen below). Architecturally and operationally, Veridata reveals its Unix/Linux-based roots (such as the use of Tomcat and the CLI), and it looks like it was developed as an in-house utility and subsequently released for general use.
The installation procedure for Veridata is simple. Unfortunately, the documentation isn't very helpful if, like me, you encounter errors or deviations during that process. Despite repeated attempts, I couldn't install Veridata on my machine and had to test the software on another Windows XP laptop supplied by GoldenGate (thus putting to rest my concerns over running Veridata on Windows).
Veridata is simple to use, too: You configure comparisons, then execute them to view the results. To configure a comparison you must set up source and target database connections, then specify tables and columns for comparison. The metadata and configuration information is stored in XML files that can be created and updated directly (using any text or XML editor) or through the Veridata Web interface. You can partition the tables horizontally (a subset of rows) or vertically (a subset of columns). By default, Veridata compares keys on a column-to-column basis and compares nonkey columns by computing a hash function that compresses data, thus saving network bandwidth and improving performance. You also can compare all columns on a one-to-one basis rather than using a hash function. Customize the comparisons by changing timing statistics, exit thresholds (setting a maximum number of error rows) and other parameters.
You can execute comparisons from the command line or the Web interface. Veridata offers a number of runtime arguments that can be applied to the comparisons, including a thread count that simultaneously executes comparisons, a "where" clause that slices source data and a delay factor between comparisons to diffuse network traffic.
Veridata records detailed comparison results that go well beyond reporting obvious mismatch errors. Outputs include processing and performance metrics, such as rows processed per second, row size before and after compression (hashing), time spent in fetching rows and in getting requests from the Veridata server (both of which help identify network bandwidth problems), and the time it takes for the row hash query to retrieve the first row. The tool also supports optimizing mechanisms, such as multirow fetches and buffer management. The results of the comparison are logged in a text file that can be viewed from the user interface. Alternatively, you specify XML output in the configuration file and choose a stylesheet for viewing.
Veridata is effective in its stated purpose, but it has some serious shortcomings. First, Veridata supports only Oracle and Tandem (HP NonStop) databases. It's difficult to imagine a general-purpose data-quality product, even a version 1.0 product, released with such limited options. Second, Veridata can compare only two data views for a match — and nothing else. Any product taking on data-quality problems must do more than simplistic, two-table-equality comparisons. If future releases offer broader database support as well as multisource and more complex comparison capabilities, it will be a versatile and powerful tool for data quality and disaster tolerance.
These reservations aside, Veridata takes simple relational data comparisons to something of an art form. The CLI will warm the hearts of Unix aficionados, while the Web interface is easy enough for anyone to use.
• GoldenGate Veridata Server and Agent run on Microsoft Windows, Linux and most popular flavors of Unix. Veridata Agent also runs on HP NonStop. Pricing starts at $90,000 per source/ target pair. Contact GoldenGate Software at www.goldengate.com
Rajan Chandras is principal consultant with the New York offices of CSC Consulting. Write to him at [email protected].
About the Author
You May Also Like