Software // Information Management
Commentary
11/9/2011
10:00 AM
Doug Henschen
Doug Henschen
Commentary
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

IT's Next Hot Job: Hadoop Guru

JPMorgan Chase makes a case for the big data platform (and career track) of the future.

Five of JP Morgan Chase's seven lines of business now use a Hadoop shared service. They use it for extract, transform, and load (ETL) processing; high-scale Basel III regulatory liquidity analyses and reporting; data mining; transaction analysis; fraud investigation; and social media sentiment analysis. It's also a low-cost storage option for all types of data, including structured financial records, semi-structured clickstreams and Web logs, and unstructured text and social comment feeds.

"We're now able to store data we could never store, collecting information from multiple lines of business," Feinsmith said, ticking off checking, credit card, mortgage, auto loan, and other services. All that information was previously in silos, but JPMorgan Chase is loading it all into a common, high-scale Hadoop system and mining that data to understand its customers better and provide better service. The big challenge isn't running Hadoop at high scale, Feinsmith said. It's sorting through the data security, entitlement, and privacy provisions for a centralized resource.

There are limits to what Hadoop can do, he said. When applications are transactional, when they demand low latency or rapid response times, or when there's lots of query complexity or concurrent workloads, JPMorgan Chase's IT organization still recommends using conventional relational databases. But when there's big data, as in lots of unstructured data or machine data such as Web logs, Feinsmith's team recommends Hadoop.

[ Want more on meeting high-volume data challenges? Read Hadoop Spurs Big Data Revolution. ]

The question is to what degree Hadoop and relational platforms will overlap over the next few years? "That's the debate we're having at JPMorgan Chase," he said, noting that the company is trying to expand the Hadoop workload. What Hadoop needs, Feinsmith said, is more maturity as an enterprise platform, including more monitoring and virtualization capabilities, and more integrations and compatibility with existing business intelligence and analytic systems.

But the biggest obstacle to broader Hadoop use within JPMorgan Chase, Feinsmith concluded, is lack of skills. "There are lots of SQL skills, SAS skills, and SPSS skills, but there are not a lot of [Hadoop] MapReduce skills," he said.

Hadoop World has tripled in size since I last attended two years ago, and I've talked with a dozen or more enthusiastic users over the past year. The platform is headed for broad adoption, so it's a sound career path, much like SQL was 30 years ago. Want a more substantial endorsement? Consider that IBM, Microsoft, and Oracle--multibillion-dollar vendors with substantial data management software revenue at stake--have all embraced Hadoop this year.

The good news is that Hadoop experts aren't born, they're trained. "I'm sure companies that train their workforces on Hadoop will derive lots of benefits," said Jeremy Lizt, VP of engineering at Rapleaf, in a recent interview. A data provider that has been using Hadoop for nearly four years, Rapleaf was among the earliest adopters. Perhaps it's his years of experience speaking, but Lizt said, "I think intelligent technologists will pick up Hadoop very quickly."

Previous
2 of 2
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
HM
50%
50%
HM,
User Rank: Strategist
11/9/2011 | 5:31:25 PM
re: IT's Next Hot Job: Hadoop Guru
Great article Doug! However, instead of using conventional relational databases as mentioned by JPMorgan, might I suggest Roxie, the HPCC Systems massive data delivery engine, which handles real-time query processing. Roxie can deliver query responses in sub-second predictable latencies to thousands of concurrent users depending on the size of the cluster and the complexity of the queries. Great for when there's lots of query complexity or concurrent workloads!
More at: http://hpccsystems.com
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Dec. 9, 2014
Apps will make or break the tablet as a work device, but don't shortchange critical factors related to hardware, security, peripherals, and integration.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of December 14, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.