Performing ETL With SQL
Product Review: Sunopsis v3 is a database-friendly extract, transform and load tool that runs without an engine.
The lifeline for any business intelligence solution is the process called extract, transform, and load (ETL). According to a recent survey by The Data Warehousing Institute (TDWI), ETL design and development work consumes 60 to 80 percent of the resources of an entire BI project. Selecting an ETL solution that fits your needs is an essential first step in building a successful data warehouse. Today, products with proprietary engines dominate the ETL market. Sunopsis brings a fresh perspective to the ETL architecture with its SQL-based product, Sunopsis v3. Instead of using an engine, Sunopsis uses the power of the database to process the data.
A Breed Apart
The ETL market is highly varied. It includes traditional players such as Informatica and Ascential Software, analytic tool specialists such as Business Objects and Cognos, and even big database vendors such as Oracle, IBM, and Microsoft. Major players such as Informatica PowerCenter and Ascential DataStage use dedicated engines to process data. The engine is a single point of control and thus can be administered and tuned for optimal ETL performance. On the other hand, products such as ETI Solution from Evolutionary Technologies International and Oracle Warehouse Builder generate code that can be run on source or target systems. Conceptually, Sunopsis embraces this approach too in that it avoids the overhead of using an engine. However, its uniqueness lies in the fact that it's able to generate SQL code for most databases available today.
Over the years, SQL has evolved from being just a query language into a rich development platform. Today, it encompasses complex analytic and data transformation functions such as rank, case, and rollup within its fold. Moreover, database vendors have invested considerable time and money in improving the performance of their queries. Sunopsis takes advantage of these developments and uses the database as its platform of choice and SQL as its primary language. A transformation can be processed where it works best — either on the source or on the target database. It can even be run on an independent staging database if need be. Furthermore, if you have existing database code in the form of stored procedures and functions, Sunopsis can exploit this work by incorporating the code into its workflow.
Data Integration Platform
A good ETL tool should provide efficient interfaces to the heterogeneous systems that make up your environment. Sunopsis uses JDBC type 4 drivers to connect to databases and XML files. This approach offers a two-fold advantage over the earlier JDBC drivers. A type 4 driver provides improved performance because it connects directly to the database listener, with no client software needed. It also allows existing JDBC applications to use XML documents as true JDBC data sources. Sunopsis also provides Java Connectivity Architecture (JCA) 2.0 connectivity to ERP, CRM, and other packaged applications.
The current trend in the data integration market is toward consolidation. Vendors are rushing to combine their ETL architecture with the enterprise application integration (EAI) framework, so that they can offer a complete data integration solution. Sunopsis does a commendable job in this arena by providing its own message queue product, Sunopsis MQ. This is a message-oriented middleware (MOM) component that's compliant with the Java Message Services (JMS) specification. The best part about it is that the ETL product is also well knit with the EAI software. For example, Sunopsis ETL can seamlessly use the message queue as either a data source or a target.
One of the major hurdles in designing a successful ETL solution is in identifying what changed in the source system since your last extract. Sunopsis makes this task easier for you by providing a change-data capture facility. This feature along with Sunopsis MQ makes the suite a good fit for organizations that need a near real-time update facility for their operational data stores.
Ease of Use
I tried using Sunopsis to create a new ETL program. I found the first step of setting up the connections to my databases (Oracle 9i) to be fairly easy. Sunopsis does a great job of reverse engineering the data model from the database. The graphical interface is rich, yet intuitive and easy to learn. With a little help from the excellent user manual, I was able to develop an ETL process quickly and execute it successfully. Sunopsis includes a built-in scheduler and also interfaces with external schedulers.
The Sunopsis product has four graphical user interfaces. You define the physical and logical architecture of the interfacing systems with the Sunopsis Topology module. The Designer (see Figure 1 here) is the interface you use for reverse engineering the data model, creating the ETL data flow, and generating the code. You manage user access to various functions and components of the system with the Security Manager. The Operator module can be used to manage the production interface for monitoring the ETL programs.
Value-Added Features
Ease of metadata management is an important requirement of a good ETL tool. Sunopsis stores the metadata associated with the ETL programs in a database repository. Unlike some other ETL products that store the metadata in a proprietary format, the Sunopsis repository can be stored on any relational database, thus making its contents available for inspection using SQL. This feature can be particularly useful while trying to find out how a change to one part of the system might affect the other parts. Of course, Sunopsis provides its own built-in option for impact analysis, too.
Most of the standard components of the ETL process can be developed using the prebuilt Knowledge Modules (KMs) that ship with the Sunopsis package. KMs are included for reverse engineering, change data capture, data extraction, data integration, and cleansing. Commonly used scenarios such as incremental updates and loading of slowly changing dimensions can be developed easily using the appropriate KMs. KMs also help in automatically using standard database utilities such as bulk load and database links to speed up the load. Best of all, you can view the source code for the KMs and customize them to your needs. The source code language, again, isn't proprietary: The KMs use the features of the technology components you already have in your environment. For example, the KMs associated with Oracle use Oracle-specific SQL functions.
Many ETL vendors are also adding data quality solutions to their suites in an effort to build a complete data integration platform. Sunopsis can take a step in the right direction by enhancing its KM for data cleansing into a full-blown data quality solution. Such a KM can add real value if it includes matching, consolidation, and linking capabilities beyond the normal name and address standardization provided by most vendors.
Right for You?
Sunopsis v3 is an interesting data integration solution that dares to be different. It backs the power and flexibility of the relational database over the traditional engine-based architecture. A favorable pricing structure is the icing on the cake. If you're an IT shop that's well versed in relational database and SQL skills, it makes sense to take a good hard look at Sunopsis v3.
Ganesh Variar [[email protected]] is a lead analyst with Regence BlueCross BlueShield of Oregon. He has 10 years of experience in managing and designing business intelligence solutions.
About the Author
You May Also Like
2024 InformationWeek US IT Salary Report
May 29, 20242022 State of ITOps and SecOps
Jun 21, 2022