Enterprise data integration is one of the biggest challenges facing most organizations today. Pervasive Software tries to meet this challenge head on with its data integration suite of products. The latest version of the suite — release 8 — consists of three offerings: Pervasive Data Integrator, Pervasive Business Integrator, and Pervasive Migration Toolkit. The Data Integrator is an ETL tool that can be used for consolidating the data from various sources on a continuous, event-driven, or regularly scheduled basis. Pervasive Business Integrator can tie multiple applications together into a common enterprise architecture framework. It relies on a message-based architecture to provide continual, real-time integration. The Migration Toolkit helps you move data from one system to another.
The sheer number and variety of vendors in the data integration space is a sign of the growing importance of this field. On one hand, traditional ETL vendors like Ascential Software and Informatica enjoy a favorable market share. On the other hand, database heavyweights such as Oracle, IBM, and Microsoft supply their own integration tools. Even BI vendors such as Business Objects and Cognos are now players in the data integration marketplace. Pervasive's data integration suite of products comes from its acquisition of Data Junction in 2003. Pervasive is also active in the data management space with its flagship product Pervasive.SQL, which evolved from the legendary PC-based data store, Btrieve.
The Building Blocks
All products within the Pervasive data integration suite are based on a common architectural framework, which includes a number of developer-focused tool sets. One of those tool sets is the Pervasive Integration Architect, a comprehensive integrated development environment for building data integration projects. It consists of a number of tools that are aimed at making development relatively painless.
The Map Designer, another tool set, provides a graphical user interface for designing mapping and transformation rules. It provides native connectivity to more than 150 types of data sources and targets. The closely related Process Designer links multiple steps in the integration process. For example, you can design several maps (data flows between source and target data stores) in the Map Designer and link these up in the Process Designer based on some conditional logic. Rapid Integration Flow Language (RIFL) is a scripting language, with syntax similar to Visual Basic, which can be used to construct the complex logic needed in the transformations. It contains hundreds of prebuilt Boolean, date, math, and string functions.
All the metadata associated with the integration tasks is stored in an open XML repository, which makes it easy for even third-party tools to access the metadata. For example, a reporting application that can read XML files can potentially import the transformation rules into its fold, thus making it easy for the users of the reporting application to view the data's lineage. Each design component is saved as an independent XML file in the repository. This arrangement promotes reuse of the individual components, for increased efficiency. The Repository Explorer can be used to get an overview of the design components. A simple double click on any design component opens the appropriate Integration Architect design tool.
The Pervasive suite consists of three types of schema designers for capturing source and target metadata. The Structured Schema Designer provides a visual interface for creating, saving, and manipulating definition files (metadata) of structured data sources such as databases and flat files. The Document Schema Designer provides a similar interface for defining trading standards such as EDI, HIPAA, HCFA, and XML. The Extract Schema Designer can be used for integrating unstructured text data sources such as email, report data, HTML, print data, or any other raw text.
The Integration Engine executes the maps and designs that you've created with the Integration Architect. It ships with its own SDK and command-line interfaces, which makes it easy to embed within your environment and interface with external applications. For example, you can use your company's standard scheduling tool to run maps through the Integration Engine. Integration Engines are currently available on Windows, Linux, HP-UX, Solaris, AIX, and OS/390. One of the most valuable features of the Integration Engine is its support for messaging. It can be used to tie two or more applications together. The Engine can interface either with the applications' native APIs or with messaging solutions such as MQSeries or SonicMQ. It can operate as a single-threaded, single-user process; a single-threaded, multiuser process; or a multithreaded, single-user process.
FIGURE 1 Pervasive Process Designer.
The More the Merrier
The Pervasive suite includes several useful add-on products that are worth mentioning. The Repository Manager helps keep track of the designs in multiple ways. It tracks dependencies between design components and offers insight into the impact of proposed changes within a project's work plan. It can produce reports on key indicators such as who created, last edited, or last executed a design — and when. The Manager can also be used to perform common tasks such as to search and replace code globally and to package components for deployment.
PRODUCT SPEC SHEET
Pervasive Integration Suite (Release 8)
MINIMUM REQUIREMENTS: For all product components on Windows 2000/XP - 100MB of available free hard drive space, 64MB RAM (96MB recommended), JDK 1.4
SUPPORTED PLATFORMS: Integration Engine runs on Windows, Linux, HP-UX, Sun Solaris, AIX, and OS/390. All other products run on Windows 2000/XP
SUPPORTED DATA SOURCES: Provides native connectivity to more than 150 types of data sources and targets.
SUPPORTED MESSAGE QUEUES: IBM MQSeries, MSMQ, and any JMS-compliant message queue.
PRICING: Pricing for the Pervasive Data Integrator and the Pervasive Business Integrator starts at $10,000 to $20,000 (based on configuration) and scales upward depending on integration need. The Pervasive Migration Toolkit for simple migration needs is $1,999.
Join Designer is an add-on application that lets you join together two or more data sources of any type prior to running a Map Designer transformation on them. The sources don't have to be the same type. For example, you could join an Oracle table with an Excel spreadsheet. The files aren't physically joined; a join configuration file is generated that contains the information that Map Designer needs to treat them as if they were a single source.
The Engine Profiler is another useful add-on product that captures run-time process statistics and plots them for easy inspection. For example, it can chart how long it took to run every transformation within an integration process. This utility helps pinpoint and diagnose processing bottlenecks.
I was able to install the Pervasive suite of products very quickly and to get started right away. I alternated between using Oracle tables and flat files as my source and the target data stores. The native connectivity made accessing these data stores very easy. I was able to navigate the user interface with a little help from the tutorial. I could construct SQL queries either through a drag and drop interface, or by typing in the statement directly.
Another feature of the product that I liked is the set of prebuilt event handlers at the record level. For example, you can use the
OnDuplicateKeyError handler to specify a certain set of actions to take when a duplicate key error is encountered. The
OnNullValueError handler can define another set of remedial measures to take when a null value is placed into a target field that doesn't allow nulls.
Lookups are one of the most frequently used features in any ETL process. Therefore, a key measure for judging ETL tools is how good they are at performing lookups. Pervasive lives up to the challenge by providing at least four kinds of lookups: flat file, SQL pass-through, dynamic SQL, and in-core tables. The last two are particularly interesting. You can use the dynamic SQL approach to construct complex lookups from large tables with millions of rows (at the cost of faster performance). The best method to construct a lookup is to use the in-core table if you can afford to store the entire lookup data in memory. Needless to say, in core is by far the fastest way to access lookup data.
As I moved from creating one map to another, I was able to reuse the components that I had built earlier. I was able to run the maps quickly — both individually from the Map Designer and as part of a multistep integration task from the Process Designer.
In a Nutshell
Pervasive has a mature suite of integration products that provide a comprehensive framework for tying data stores and applications across the enterprise together. Most competitors will find it difficult to match the sheer number of native connectors that Pervasive provides. Developers will like the code reusability theme prevalent across the entire product set. Memory-resident in-core lookups and built-in support for messaging makes it a versatile solution indeed. In short, the Pervasive data integration suite certainly packs a punch.
Ganesh Variar is a lead analyst with Regence BlueCross BlueShield of Oregon. He has 10 years' experience in managing and designing business intelligence solutions.
Ascential Software: www.ascential.com