Put to the Test: Oracle Data Integrator 2
Oracle has upgraded the Sunopsis ETL technology it acquired last fall to address master data management and SOA needs as well as data warehousing and data migration. It's not an easy tool to master, but it's a powerful and versatile choice if you're willing to invest in training.
PROS |
---|
• Lets you reverse-engineer data structures and work at a "business rules" level of abstraction so you can create projects quickly. |
• Architecture moves data directly from source servers to a target server, using each server's native technology. |
• Graphical UIs and object cross-referencing help new users quickly pick up projects where others left off. |
CONS |
• As with any big tool, truly mastering ODI may take months. |
• The documentation, while useful, won't take you all the way. Expect to invest in training. |
Last October, Oracle purchased Sunopsis and its Active Integration Platform (AIP) product, which included the subset of functionality also sold as Data Conductor. In February, Oracle released its first enhanced version of this ETL-oriented product with advanced application integration features. The product has been rebranded and reversioned to be consistent with Oracle's data management product line.
Now known as Oracle Data Integrator v 10.1.3 (ODI), the product maintains the architecture and functionality of Sunopsis AIP, with some significant enhancements – most notably, a big improvement in the ability to integrate with Service Oriented Architecture (SOA) applications. ODI automates information exchange between applications, and Oracle promotes the product in four key areas:
Data Warehousing and Business Intelligence
Service-Oriented Architecture
Master Data Management (MDM)
Migration
After working with ODI in a test environment and speaking with large users of the last Sunopsis release, I've come to the conclusion that ODI serves all four needs well, with an architecture that is elegantly simple and has minimal hardware requirements. ODI's structure and tools make it relatively easy to manage, particularly where people from different parts of the organization may play a part in the overall project.
Two features are worth mentioning up front. First, ODI takes a business rules approach to data transformation. You define data structures and transformation rules, and ODI generates the necessary code, appropriate to the source and destination data technologies. Second, ODI does not require an intermediate server – data is moved directly between the source and destination servers. This eliminates a network hop for data and allows the native database technology of a source or destination server to perform necessary data transformations.
A Look at the Architecture
Much of ODI has been implemented in Java, for compatibility with a broad variety of platforms. ODI includes design-time and run-time components that make use of a common set of metadata repositories. A variety of interfaces and API's support integration with external applications, including SOA applications and traditional job-scheduling systems.
ODI presents several primary user interfaces (UIs). The Topology Manager UI is used to define the data access technologies and the high-level physical and logical data stores that will be the sources and targets for data movement and transformation in the projects you create. Each supported data technology (databases, flat files, spreadsheet files and XML data structures for example) has an associated pre-programmed Knowledge Module (KM). Each KM, together with an appropriate connectivity layer such as JDBC or JMS , provides the code necessary to work with that technology. ODI also includes generic SQL and JMS Knowledge Modules, and exposes API's supporting the development of Knowledge Modules for data services and other proprietary technologies.
The Designer UI, where developers will spend most of their time, is used to supply the detailed schemas and data flows that are at the heart of the work. Within Designer, Models describe your data structures – tables, columns, constraints etc. Projects describe how you'll use these data structures. Within a Project, you create an Interface to describe how data will be loaded and transformed from one or more source datastores to a target datastore. You then combine Interfaces with Procedures, tools and other objects to create Packages, ODI's largest executable objects. When complete, compiled versions of Interfaces, Procedures or Packages, called Scenarios, are stored in the Repository. Scenarios are the unit you schedule for execution. The ability to create multiple Contexts lets you quickly reassign your Projects and Scenarios between physical datastores – for example, between development and production datastores.
The Operator UI is used to schedule and manage job execution. The Security UI provides very granular control over who may access a project and what they are authorized to do. A lightweight Web interface – the Lightweight Designer - lets you remotely monitor job execution and take basic actions to correct operational problems.
Hands-on with ODI
After working with ODI for a while, a few things became apparent. This is not a product you will master quickly. However, given specific project objectives and basic training, you will be able to find what you need to know to get the job done. I suspect that within a few days most developers will be comfortable working within ODI's architecture and user interfaces. Those who have implemented ETL systems using more brute-force methods will quickly appreciate ODI's level of abstraction, which lets you focus on describing what you want to accomplish with little concern for the details of programming.
Installing ODI was easy. To ease your learning curve, ODI installs a demonstration environment complete with a Test Repository and a Demonstration Repository containing predefined objects, databases and flat files. A Getting Started script gave me a good feel for how to work with ODI. A little knowledge of SQL is required, as you define both constraints and transformations using native SQL statements. Data validation features create a dataset of invalid data that you can correct and recycle back in. Creating an Interface is largely a process of dragging source and target tables from your Models. ODI will automatically map same-named columns between source and target tables, and will validate your SQL statements against the server. ODI's Expression Editor provides sample SQL statements to help you with the syntax, easing the definition of data transformations. A Business Rules tab lets those with knowledge of the data to describe column transformations and a SQL writer to complete the interface.
The operations side of ODI has a lot of flexibility. Once you create the elements of a project, you combine them in a Package by dragging and dropping the elements. The screen shot at left shows a completed package consisting of one procedure and several interfaces that took just a few minutes to create. A right-click marks the first step. OK and Non-OK tools let you designate the next step to execute upon successful completion or upon failure. Designer lets you execute a package, and you can track its progress by clicking over to the Operator UI. As shown in this view (right), Operator lets you view the detailed steps that ODI created for each Interface as well as the details of any errors that may have occurred.
The next step is to compile the Package into a Scenario. Stored in a Repository, Scenarios are the normal unit of execution in a production environment. They can be scheduled for execution, exported to another repository, run from a supplied CLI or run in other Contexts to work with different physical datastores.
Moving to a clean Repository, I created my own application, moving data from one SQL Server and creating an altered version three tables on a second SQL Server, following examples from the demonstration script. In the end, it executed with no surprises.
I also performed a full installation of ODI on a fresh Windows Server 2003 system. Using the documentation I created a Master Repository on a SQL Server 2005 system. It is unlikely you'll get very far without using the documentation. While the documentation was certainly helpful, it isn't complete in every detail. The error messages that ODI presents are useful, and lead to a resolution when issues occurred.
Documenting large projects is generally a requirement. ODI creates PDF format documentation in several levels of detail for Projects and their components, a convenient way to document the status of a project at a point in time.
Synopsis user Overstock.com needed to create a near real-time reporting system to replace existing reporting directly from sales applications. "A key reason we chose Data Conductor as our primary product is that it works inside the database," says Jack Garzella, director of data warehousing. "It doesn't require application servers or extra disk."
Other key points in Overstock.com's decision to use the product included the ease with which a new user can look at a project and see what others have done, since ODI generates the SQL code. In addition, tools within Designer support impact analysis and change control, showing you where tables and interfaces are being used. "We have over 400 jobs, some daily, some run continuously," says Garzella. "In total they run over 90,000 times a day. It's a very stable environment."
The Bottom Line
There are many features I didn't describe here, like changed data capture, Web services integration and event monitoring. This isn't a product that you can learn by simply exploring the user interfaces. There's a lot of documentation that you'll need to get familiar with. I wouldn't recommend ODI for a one-time migration project unless you have people already proficient with ODI. Overall, the product works as advertised and I would recommend it to organizations with ongoing needs that would find the long-term productivity gains worth the training investment.
Oracle Data Integrator v 10.1.3 is $12,000 per processor for target databases, plus $4,000 per processor for source databases (no charge for file or Java Message Service (JMS) based data sources).
John Green is president of Nereus Computer Consulting and has more than 30 years of IT experience. Write him at [email protected].
About the Author
You May Also Like