Software
News
9/28/2012
01:03 PM
Connect Directly
Google+
RSS
E-Mail
50%
50%

How LexisNexis Competes In Hadoop Age

Open source HPCC platform evolves from turnkey system to Hadoop competitor.

Hadoop is the certainly biggest name in big data platforms, and often the go-to solution for enterprises seeking a way to manage growing volumes of unstructured data. But LexisNexis, best known as a provider of computer-assisted legal research services, wants the world to know it has an alternative, albeit one that relatively few organizations are using.

HPCC (High Performance Computing Cluster) is an open source platform from LexisNexis Risk Solutions, a division of the company that focuses on big data products and services. LexisNexis, itself a subsidiary of global publishing giant Reed Elsevier, uses HPCC technology for its risk management business, and to gather data it sells to its clients.

"Over the last 10 years or so, we've been selling some of these platforms to customers who came asking for them. But we weren't too proactive in pushing them to market," said Flavio Villanustre, VP of infrastructure for Lexis Nexis Risk Solutions' HPCC Systems. "We thought it was our bread and butter, our core technology, so why sell it?"

During that decade, customers for HPCC turnkey systems included government, intelligence, and law enforcement agencies, as well as financial and risk management firms.

But the sudden emergence of big data led LexisNexis to rethink its strategy.

"Over the last two to three years, we started to see the rise of big data. Before then, it was hard for us to think there was a use for what we had," said Villanustre.

[ Learn about another Hadoop Alternative: Open Source Quantcast Touts Speed. ]

Believing it had the superior platform for managing massive volumes of information, LexisNexis decided a year ago to offer HPCC as open source code. It positioned the platform as a competitor to Hadoop and other big data management systems.

To date, HPCC's industry footprint is still quite small. Villanustre estimates between 50 and 60 organizations use the enterprise edition.

"People can use the open source version, but if they want support, training, or other more advanced modules, they can buy the enterprise license," said Villanustre.

According to LexisNexis, there are several noteworthy differences between HPCC and Hadoop, including HPCC's open-sourced Enterprise Control Language (ECL). For data transformations, ECL's capabilities are similar to those of Pig or Hive. It's a high-level programming language, which in theory means fewer programmers and shorter project-completion times.

HPCC is an integrated system that extends across the entire data lifecycle, including data ingestion, processing, and delivery. It's scalable up to several thousand nodes, and HPCC configurations require fewer nodes to deliver the same processing power as a Hadoop cluster, the company claims. For an in-depth, if partisan, HPCC vs. Hadoop comparison, see this HPCC Systems chart.

HPCC customers today use the platform for a variety of sophisticated, data-intensive applications, including fraud detection and identity verification.

"When it comes to fraud, for example, we have a very good social graph analytics system," Villanustre said. "We can take the social graph of large populations--hundreds of millions of people--and use that information to show (connections) between apparently disconnected potential fraud cases."

The market for big data management platforms is very new. Hadoop may be the best-known solution today, but its shortcomings provide an opportunity for competing platforms.

"It has operational limitations, and you need to resort to a number of extended components to make it work, and to make it reliable," said Villanustre. "We think some of the companies using Hadoop today might go through disillusionment," and perhaps switch to other platforms, including HPCC.

"Hadoop will become more fragmented, pulled across by different commercial players trying to leverage their own solutions," he said.

Villanustre also pointed out that many organizations are finding Hadoop difficult to use. "There's a lack of talent in that area," he added.

In-memory analytics offers subsecond response times and hundreds of thousands of transactions per second. Now falling costs put it in reach of more enterprises. Also in the Analytics Speed Demon special issue of InformationWeek: Louisiana State University hopes to align business and IT more closely through a master's program focused on analytics. (Free registration required.)

Comment  | 
Print  | 
More Insights
Google in the Enterprise Survey
Google in the Enterprise Survey
There's no doubt Google has made headway into businesses: Just 28 percent discourage or ban use of its productivity ­products, and 69 percent cite Google Apps' good or excellent ­mobility. But progress could still stall: 59 percent of nonusers ­distrust the security of Google's cloud. Its data privacy is an open question, and 37 percent worry about integration.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - August 27, 2014
Who wins in cloud price wars? Short answer: not IT. Enterprises don't want bare-bones IaaS. Providers must focus on support, not undercutting rivals.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Howard Marks talks about steps to take in choosing the right cloud storage solutions for your IT problems
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.