Informatica PowerCenter 7.0 takes an ambitious development path to regain data integration leadership.

InformationWeek Staff, Contributor

June 25, 2004

7 Min Read

The market space for data extract, transform, and load (ETL) is paradoxical these days, marked simultaneously by crowding and consolidation. Two factors account for this condition: an increasing number of players entering the lucrative ETL market and a consolidation of existing players in overall data management. Database vendors such as Oracle and Microsoft are adding and augmenting ETL capabilities in the database, vendors such as Journee and Siperian are providing ETL-like capabilities under the guise of enterprise information integration, and companies such as SAS and SyncSort are bringing their data management expertise to bear in ETL solutions. At the same time, vendor ranks have thinned due to recent mergers, such as Sagent with Group 1 Software and Data Junction with Pervasive Software.

PowerCenter 7.0, the latest version of the flagship ETL solution from Informatica, is the company's attempt to stay competitive in this mixed market and maintain a frontline position in mind share and market share.

Three Levels of Labor

PowerCenter 7.0 (I'll call it PC7) is an ETL tool in the classic mold: data extract, transform, and load logic is constructed in a (mostly) sequential arrangement of graphical objects that flow from source to target. The objective is conceptually simple: Read data from source, transform it as needed, and write it to target. Reality is a little more complex, of course, and the construction of logic happens at three levels.

At the lowest level, individual graphical objects can be sources, targets, or transformations (sources and targets can be themselves considered as special types of transformations). A source transformation is used to read from a data source, and supply that data in sequential row-wise fashion for subsequent processing. At the other end of the logic stream, the target transformation receives data (again, in row-wise order) and writes it out to recipient data structures. The remaining intermediate transformations do just that — transform data values as required.

Sources, targets, and transformations are assembled in a daisy chain to form the next level of processing, which in PC7 is called the "mapping." A mapping is the end-to-end flow of logic, from one or more source transformation to one or more target transformations.

The execution of the mapping, called the "workflow" in PC7, provides the third level of the overall logic. The workflow provides for the execution of multiple mappings and dependencies among mappings. In standard programming terms, the transforms are the syntax and components of the program, the mapping is the overall program itself, and the workflow is the execution and production of one or more programs.

There are PC7 components that correspond to these levels. The PowerCenter Designer is the programming integrated development environment (IDE), where you "assemble" all the sources, targets, and transformations to create a mapping. The PowerCenter Workflow Manager is used to build a workflow around the mapping. The Workflow Monitor provides production support capabilities for the workflow. In addition, there are the PowerCenter Repository Manager and the Repository Server Manager, which provide administration capabilities for the PC7 Repository (more on the this a little later).

Conventional Improvements

A key measure of an ETL tool's strength is the number of sources and targets it supports, and the variety and performance of the transformations. PC7 supports a wide variety of data sources, such as relational (using native connectivity), ODBC, XML, and fixed-width and delimited flat files. The acquisition of Striva, a leader in mainframe connectivity tools, adds to Informatica's connectivity repertoire. In addition, Informatica PowerExchange (formerly called PowerConnect) is a family of gateways that allow access to applications such as SAP, Siebel, and PeopleSoft and to middleware and other solutions such as Tibco, IBM MQSeries, webMethods, and SAS. PC7 also introduces bidirectional support for Web services and allows PC7 to act as a provider as well as consumer of Web services. Informatica PowerChannel provides a secure extension to PC7 for purposes of data transfer across wide area networks and the Internet, by incorporating encryption and authentication technology from RSA, a leading security vendor. Together with the means to read and write data, PC7 provides numerous transformations that let you cleanse, transform, aggregate, and segregate data as needed, as well as apply data and business rules.

But sources, targets, and transformations should be considered fairly routine in an ETL tool. They are, ultimately, the tool's raison d'etre. To stand out from the crowd, any ETL tool must go beyond the routine. There are several factors that make Informatica PC7 a strong product, beginning with the PowerCenter architecture.

Strengths and Weaknesses

The PowerCenter architecture comprises three main components. (See Figure 1). The Repository is the collection of all PC7 objects (sources, targets, transformations, mappings, workflows, and so on) and is housed in a standard relational database. The Repository Server is the server application that manages the repository database. Finally, the Informatica Server is the server application that runs the workflows and the mappings. PowerCenter Web services are managed through the Web Services Hub, another component of the architecture, which supports standards such as Simple Object Access Protocol (SOAP), Web Services Description Language (WSDL), and Universal Description, Discovery, and Integration (UDDI). The architectural components can be collocated on a single server or spread across diverse servers, which allows solution parallelism, flexibility, and scalability.


FIGURE 1 - Informatica PowerCenter architecture. (Courtesy: Informatica Corp.)

PC7 offers server grid capabilities, too, by which PowerCenter can distribute loads across heterogeneous Unix, Windows, or Linux-based computing platforms. Although grid capabilities may seem exciting, I don't believe they match real-world need for grid computing yet, and I wouldn't recommend using them in place of other industry grid solutions.

Another source of strength for PowerCenter is its integrated data profiling and cleansing capability. As with its grid capabilities, the data profiling in PC7 doesn't match other industrial-strength data profiling solutions, but the comparison ends there. For an ETL tool, integrated data profiling is always a good idea and useful in any measure. For data cleansing, PC7 embeds technology from FirstLogic.

PowerCenter supports team-based development through version control of mappings and workflows and through role-based security. Tight integration with Informatica PowerAnalyzer (Informatica's business intelligence product) enables users to browse and analyze PowerCenter metadata. For more on Informatica PowerAnalyzer, read "BI Takes a Step Forward," in the November 18, 2003, issue of Intelligent Enterprise.

Curiously, some deficiencies are common among many ETL tools, and PC7 is no exception. In particular, the lack of global variables within a mapping (not quite addressed by PC7 mapping variables) and of ways to effectively document mappings is not just annoying — it's a genuine shortcoming that limits the effectiveness of the ETL tool. Also, be aware that ETL tools are in general a slower (if more elegant) alternative to native SQL processing (such as Oracle PL*SQL or Microsoft Transact SQL).

Pacing a Growing Market

A research survey from Forrester reports that spending for data warehousing is expected to rise. Growing volumes of data, shrinking tolerance for data latency, growing data complexity, and mandated privacy initiatives and security concerns are leading business drivers for data warehousing and ETL solutions in 2004. In today's rapidly changing market space, where disappointingly common vendor approaches in ETL technology are leading toward solution commoditization, survival will depend upon factors such as product differentiation, reading the customer's mind, and time to market. Informatica PowerCenter 7.0 is no different in concept from many of its competitive products, yet (with forays into grid computing and integrated profiling and cleansing, and its open architectural approach) it appears well positioned to keep pace with — if not rise above — the competition.

As a product, PowerCenter 7.0 offers a compelling alternative in the ETL space. But more often than not, ETL is seen to be a process with a mission, tied to data warehousing, business intelligence, or both. And the feature-rich, tightly bound, and architecturally open combination of PC7 and PowerAnalyzer has the potential to lead the ETL-plus-BI solution space, against close competition from the likes of Ascential and SAS. But the race isn't over yet, and any front-runner would do well to heed the plight of Smarty Jones in the Belmont Stakes.

Rajan Chandras is a principal consultant with the New York offices of CSC Consulting. The opinions expressed here are his own.

PRODUCT SPEC SHEET
Informatica PowerCenter 7.0
2100 Seaport Blvd.
Redwood City, CA 94063
650-385-5000; fax 650-385-5500
www.informatica.com

Minimum Requirements:
Windows 2000 or 2003, Unix (AIX, HP-UX, Linux, or Solaris); 200MB disk space; 256MB RAM.

Database Connectivity:
Native drivers or ODBC drivers

Pricing:
Starts at around $200,000. Flexible pricing model.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights