IT's Next Hot Job: Hadoop Guru - InformationWeek
Software // Information Management
10:00 AM
Doug Henschen
Doug Henschen
Connect Directly
4 Keys to Improving Security Threat Detection
Dec 15, 2016
In this webinar, Ixia will show how to combine the four keys to improving security threat detectio ...Read More>>

IT's Next Hot Job: Hadoop Guru

JPMorgan Chase makes a case for the big data platform (and career track) of the future.

Five of JP Morgan Chase's seven lines of business now use a Hadoop shared service. They use it for extract, transform, and load (ETL) processing; high-scale Basel III regulatory liquidity analyses and reporting; data mining; transaction analysis; fraud investigation; and social media sentiment analysis. It's also a low-cost storage option for all types of data, including structured financial records, semi-structured clickstreams and Web logs, and unstructured text and social comment feeds.

"We're now able to store data we could never store, collecting information from multiple lines of business," Feinsmith said, ticking off checking, credit card, mortgage, auto loan, and other services. All that information was previously in silos, but JPMorgan Chase is loading it all into a common, high-scale Hadoop system and mining that data to understand its customers better and provide better service. The big challenge isn't running Hadoop at high scale, Feinsmith said. It's sorting through the data security, entitlement, and privacy provisions for a centralized resource.

There are limits to what Hadoop can do, he said. When applications are transactional, when they demand low latency or rapid response times, or when there's lots of query complexity or concurrent workloads, JPMorgan Chase's IT organization still recommends using conventional relational databases. But when there's big data, as in lots of unstructured data or machine data such as Web logs, Feinsmith's team recommends Hadoop.

[ Want more on meeting high-volume data challenges? Read Hadoop Spurs Big Data Revolution. ]

The question is to what degree Hadoop and relational platforms will overlap over the next few years? "That's the debate we're having at JPMorgan Chase," he said, noting that the company is trying to expand the Hadoop workload. What Hadoop needs, Feinsmith said, is more maturity as an enterprise platform, including more monitoring and virtualization capabilities, and more integrations and compatibility with existing business intelligence and analytic systems.

But the biggest obstacle to broader Hadoop use within JPMorgan Chase, Feinsmith concluded, is lack of skills. "There are lots of SQL skills, SAS skills, and SPSS skills, but there are not a lot of [Hadoop] MapReduce skills," he said.

Hadoop World has tripled in size since I last attended two years ago, and I've talked with a dozen or more enthusiastic users over the past year. The platform is headed for broad adoption, so it's a sound career path, much like SQL was 30 years ago. Want a more substantial endorsement? Consider that IBM, Microsoft, and Oracle--multibillion-dollar vendors with substantial data management software revenue at stake--have all embraced Hadoop this year.

The good news is that Hadoop experts aren't born, they're trained. "I'm sure companies that train their workforces on Hadoop will derive lots of benefits," said Jeremy Lizt, VP of engineering at Rapleaf, in a recent interview. A data provider that has been using Hadoop for nearly four years, Rapleaf was among the earliest adopters. Perhaps it's his years of experience speaking, but Lizt said, "I think intelligent technologists will pick up Hadoop very quickly."

2 of 2
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Moderator
11/9/2011 | 5:31:25 PM
re: IT's Next Hot Job: Hadoop Guru
Great article Doug! However, instead of using conventional relational databases as mentioned by JPMorgan, might I suggest Roxie, the HPCC Systems massive data delivery engine, which handles real-time query processing. Roxie can deliver query responses in sub-second predictable latencies to thousands of concurrent users depending on the size of the cluster and the complexity of the queries. Great for when there's lots of query complexity or concurrent workloads!
More at:
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
Top IT Trends to Watch in Financial Services
IT pros at banks, investment houses, insurance companies, and other financial services organizations are focused on a range of issues, from peer-to-peer lending to cybersecurity to performance, agility, and compliance. It all matters.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on for the week of November 6, 2016. We'll be talking with the editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll