Software // Information Management
10:00 AM
Doug Henschen
Doug Henschen
Connect Directly
Risk Data as a Strategy
Apr 06, 2016
There is a renewed focus on risk data aggregation and reporting (RDAR) solutions, as financial ins ...Read More>>

IT's Next Hot Job: Hadoop Guru

JPMorgan Chase makes a case for the big data platform (and career track) of the future.

Five of JP Morgan Chase's seven lines of business now use a Hadoop shared service. They use it for extract, transform, and load (ETL) processing; high-scale Basel III regulatory liquidity analyses and reporting; data mining; transaction analysis; fraud investigation; and social media sentiment analysis. It's also a low-cost storage option for all types of data, including structured financial records, semi-structured clickstreams and Web logs, and unstructured text and social comment feeds.

"We're now able to store data we could never store, collecting information from multiple lines of business," Feinsmith said, ticking off checking, credit card, mortgage, auto loan, and other services. All that information was previously in silos, but JPMorgan Chase is loading it all into a common, high-scale Hadoop system and mining that data to understand its customers better and provide better service. The big challenge isn't running Hadoop at high scale, Feinsmith said. It's sorting through the data security, entitlement, and privacy provisions for a centralized resource.

There are limits to what Hadoop can do, he said. When applications are transactional, when they demand low latency or rapid response times, or when there's lots of query complexity or concurrent workloads, JPMorgan Chase's IT organization still recommends using conventional relational databases. But when there's big data, as in lots of unstructured data or machine data such as Web logs, Feinsmith's team recommends Hadoop.

[ Want more on meeting high-volume data challenges? Read Hadoop Spurs Big Data Revolution. ]

The question is to what degree Hadoop and relational platforms will overlap over the next few years? "That's the debate we're having at JPMorgan Chase," he said, noting that the company is trying to expand the Hadoop workload. What Hadoop needs, Feinsmith said, is more maturity as an enterprise platform, including more monitoring and virtualization capabilities, and more integrations and compatibility with existing business intelligence and analytic systems.

But the biggest obstacle to broader Hadoop use within JPMorgan Chase, Feinsmith concluded, is lack of skills. "There are lots of SQL skills, SAS skills, and SPSS skills, but there are not a lot of [Hadoop] MapReduce skills," he said.

Hadoop World has tripled in size since I last attended two years ago, and I've talked with a dozen or more enthusiastic users over the past year. The platform is headed for broad adoption, so it's a sound career path, much like SQL was 30 years ago. Want a more substantial endorsement? Consider that IBM, Microsoft, and Oracle--multibillion-dollar vendors with substantial data management software revenue at stake--have all embraced Hadoop this year.

The good news is that Hadoop experts aren't born, they're trained. "I'm sure companies that train their workforces on Hadoop will derive lots of benefits," said Jeremy Lizt, VP of engineering at Rapleaf, in a recent interview. A data provider that has been using Hadoop for nearly four years, Rapleaf was among the earliest adopters. Perhaps it's his years of experience speaking, but Lizt said, "I think intelligent technologists will pick up Hadoop very quickly."

2 of 2
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Moderator
11/9/2011 | 5:31:25 PM
re: IT's Next Hot Job: Hadoop Guru
Great article Doug! However, instead of using conventional relational databases as mentioned by JPMorgan, might I suggest Roxie, the HPCC Systems massive data delivery engine, which handles real-time query processing. Roxie can deliver query responses in sub-second predictable latencies to thousands of concurrent users depending on the size of the cluster and the complexity of the queries. Great for when there's lots of query complexity or concurrent workloads!
More at:
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
4 Trends Shaping Digital Transformation in Insurance
Insurers no longer have a choice about digital adoption if they want to remain relevant. A comprehensive enterprise-wide digital strategy is fundamental to doing business today.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on for the week of April 24, 2016. We'll be talking with the editors and correspondents who brought you the top stories of the week!
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.