How LexisNexis Competes In Hadoop Age

Open source HPCC platform evolves from turnkey system to Hadoop competitor.

Hadoop is the certainly biggest name in big data platforms, and often the go-to solution for enterprises seeking a way to manage growing volumes of unstructured data. But LexisNexis, best known as a provider of computer-assisted legal research services, wants the world to know it has an alternative, albeit one that relatively few organizations are using.

HPCC (High Performance Computing Cluster) is an open source platform from LexisNexis Risk Solutions, a division of the company that focuses on big data products and services. LexisNexis, itself a subsidiary of global publishing giant Reed Elsevier, uses HPCC technology for its risk management business, and to gather data it sells to its clients.

"Over the last 10 years or so, we've been selling some of these platforms to customers who came asking for them. But we weren't too proactive in pushing them to market," said Flavio Villanustre, VP of infrastructure for Lexis Nexis Risk Solutions' HPCC Systems. "We thought it was our bread and butter, our core technology, so why sell it?"

During that decade, customers for HPCC turnkey systems included government, intelligence, and law enforcement agencies, as well as financial and risk management firms.

But the sudden emergence of big data led LexisNexis to rethink its strategy.

"Over the last two to three years, we started to see the rise of big data. Before then, it was hard for us to think there was a use for what we had," said Villanustre.

[ Learn about another Hadoop Alternative: Open Source Quantcast Touts Speed. ]

Believing it had the superior platform for managing massive volumes of information, LexisNexis decided a year ago to offer HPCC as open source code. It positioned the platform as a competitor to Hadoop and other big data management systems.

To date, HPCC's industry footprint is still quite small. Villanustre estimates between 50 and 60 organizations use the enterprise edition.

"People can use the open source version, but if they want support, training, or other more advanced modules, they can buy the enterprise license," said Villanustre.

According to LexisNexis, there are several noteworthy differences between HPCC and Hadoop, including HPCC's open-sourced Enterprise Control Language (ECL). For data transformations, ECL's capabilities are similar to those of Pig or Hive. It's a high-level programming language, which in theory means fewer programmers and shorter project-completion times.

HPCC is an integrated system that extends across the entire data lifecycle, including data ingestion, processing, and delivery. It's scalable up to several thousand nodes, and HPCC configurations require fewer nodes to deliver the same processing power as a Hadoop cluster, the company claims. For an in-depth, if partisan, HPCC vs. Hadoop comparison, see this HPCC Systems chart.

HPCC customers today use the platform for a variety of sophisticated, data-intensive applications, including fraud detection and identity verification.

"When it comes to fraud, for example, we have a very good social graph analytics system," Villanustre said. "We can take the social graph of large populations--hundreds of millions of people--and use that information to show (connections) between apparently disconnected potential fraud cases."

The market for big data management platforms is very new. Hadoop may be the best-known solution today, but its shortcomings provide an opportunity for competing platforms.

"It has operational limitations, and you need to resort to a number of extended components to make it work, and to make it reliable," said Villanustre. "We think some of the companies using Hadoop today might go through disillusionment," and perhaps switch to other platforms, including HPCC.

"Hadoop will become more fragmented, pulled across by different commercial players trying to leverage their own solutions," he said.

Villanustre also pointed out that many organizations are finding Hadoop difficult to use. "There's a lack of talent in that area," he added.

In-memory analytics offers subsecond response times and hundreds of thousands of transactions per second. Now falling costs put it in reach of more enterprises. Also in the Analytics Speed Demon special issue of InformationWeek: Louisiana State University hopes to align business and IT more closely through a master's program focused on analytics. (Free registration required.)

Editor's Choice
Sara Peters, Editor-in-Chief, InformationWeek / Network Computing
John Edwards, Technology Journalist & Author
John Edwards, Technology Journalist & Author
James M. Connolly, Contributing Editor and Writer