A number of data-integration and data-management vendors (IBM, Oracle, Syncsort, Talend) have tackled the obvious: getting data into and out of Hadoop. Informatica went a step further in October when it introduced HParser, a data-transformation environment optimized for Hadoop. The software supports processing of any file format inside Hadoop with scale and efficiency, according to Informatica, giving Hadoop developers out-of-the-box parsing capabilities to address complex and varied data sources, including logs, documents, binary or hierarchical data, and industry standard formats (such as NACHA in banking, SWIFT for payments, FIX for financial data, and ACORD for insurance). Just as in-database processing speeds various analytic approaches, Informatica is putting parsing and, soon, other data-processing code inside Hadoop to take advantage of all that processing power.
Informatica aims to provide a single platform that can handle the sweep of data-management and data-integration needs with a consistent environment and approach. The company has more than 4,300 customer firms, and it estimates more than 10% are moving into the big-data realm (exceeding 100 terabytes). Market presence and innovation make Informatica a Hadoop-savvy vendor to watch.