SAS Acquires Text Mining Firm, Upgrades Core Platform

Responding to competitive threats, SAS buys Teragram for its natural language processing, categorization and search technologies.
Breaking news on several fronts at its annual Global Forum this week, SAS announced the acquisition of natural language processing (NLP) vendor Teragram, it released the version 9.2 upgrade of its core SAS platform, and it unveiled prepackaged SAS analytics that are ready to run within Teradata environments.

Teragram will operate as a SAS company using its current name and maintaining current sales channels — much as SAS' DataFlux unit operates. Based in Cambridge, Mass., Teragram provides NLP technologies that interpret linguistic relations and word meanings with the help of large annotated dictionaries containing several hundred million words in more than 30 languages. The company also offers advanced categorization technologies used to automatically classify and tag documents according to custom criteria. In the area of search, Teragram’s NLP technologies scan structured databases and text-based sources to yield comprehensive answers from multiple sources. Teragram was a privately held company and terms of the deal were not disclosed.

SAS was an early entrant into the text mining field, having introduced its SAS Text Miner back in 2002, but the acquisition of Teragram comes as rivals are stepping up competition on this front. IBM, which recently acquired Cognos, has developed and deployed its own text mining capabilities, and Business Objects, an SAP company, last year acquired Inxight, a key supplier of natural language text analysis and extraction technology used in SAS Text Miner. Business Objects recently integrated Inxight's technology into its BusinessObjects XI 3.0 platform upgrade, introduced in February.

SAS Text Miner will continue to use Inxight technology under a long-term OEM contract, according to executives, but the Teragram deal will enable SAS to apply NLP in new areas. "We'll be coupling [Teragram's] search capabilities with our BI offering, and it won't be limited to just headings or metadata, so you can discover new insights as well as retreive relevant existing reports," says Anne Milley, director of product marketing. "There are [also] synergies between natural language processing and our entire platform. For example, we can apply text analytics to get more consistent and unbiased metadata tagging. If you try to do that manually, it's onerous and you're clouding the results with human judgments and biases."

The SAS Global Forum, held this year in San Antonio, Texas, also marked the unveiling of SAS 9.2, which has been beefed up in three key areas. Improved support for Bayesian methods helps users bring previously available information into statistical analyses, an approach often used in clinical trials in the medical device industry. The approach leads to smaller and shorter clinical trials and, thus, quicker regulatory approvals and faster time to market. A new set of optimization procedures in SAS 9.2 is said to bring powerful algebraic modeling to bear when building complex optimization models and determining model biases. Finally, improved model-selection capabilities in SAS 9.2 were developed in response to rampant growth in data volumes and complexity.

"With the compute power available today, you can try all sorts of different models, but there are many constraints around what kinds of models you can actually build and put into production," says Milley. "These model-selection capabilities help you select the optimal inputs based on your goals and objectives, whether that's model stability, long shelf life or other demands."

Following up on last year's announcement of a strategic in-database partnership, SAS has released a new SAS Scoring Accelerator for Teradata. The Accelerator lets customers translate scoring models created in SAS Enterprise Miner into Teradata functions that can run directly within the Teradata database. Test are said to show that the approach improves performance by as much as 4,500 percent in terms of the number of records scored per second versus traditional SAS scoring with Teradata.