JPMorgan Chase makes a case for the big data platform (and career track) of the future.
Five of JP Morgan Chase's seven lines of business now use a Hadoop shared service. They use it for extract, transform, and load (ETL) processing; high-scale Basel III regulatory liquidity analyses and reporting; data mining; transaction analysis; fraud investigation; and social media sentiment analysis. It's also a low-cost storage option for all types of data, including structured financial records, semi-structured clickstreams and Web logs, and unstructured text and social comment feeds.
"We're now able to store data we could never store, collecting information from multiple lines of business," Feinsmith said, ticking off checking, credit card, mortgage, auto loan, and other services. All that information was previously in silos, but JPMorgan Chase is loading it all into a common, high-scale Hadoop system and mining that data to understand its customers better and provide better service. The big challenge isn't running Hadoop at high scale, Feinsmith said. It's sorting through the data security, entitlement, and privacy provisions for a centralized resource.
There are limits to what Hadoop can do, he said. When applications are transactional, when they demand low latency or rapid response times, or when there's lots of query complexity or concurrent workloads, JPMorgan Chase's IT organization still recommends using conventional relational databases. But when there's big data, as in lots of unstructured data or machine data such as Web logs, Feinsmith's team recommends Hadoop.
The question is to what degree Hadoop and relational platforms will overlap over the next few years? "That's the debate we're having at JPMorgan Chase," he said, noting that the company is trying to expand the Hadoop workload.
What Hadoop needs, Feinsmith said, is more maturity as an enterprise platform, including more monitoring and virtualization capabilities, and more integrations and compatibility with existing business intelligence and analytic systems.
But the biggest obstacle to broader Hadoop use within JPMorgan Chase, Feinsmith concluded, is lack of skills. "There are lots of SQL skills, SAS skills, and SPSS skills, but there are not a lot of [Hadoop] MapReduce skills," he said.
Hadoop World has tripled in size since I last attended two years ago, and I've talked with a dozen or more enthusiastic users over the past year. The platform is headed for broad adoption, so it's a sound career path, much like SQL was 30 years ago. Want a more substantial endorsement? Consider that IBM, Microsoft, and Oracle--multibillion-dollar vendors with substantial data management software revenue at stake--have all embraced Hadoop this year.
The good news is that Hadoop experts aren't born, they're trained. "I'm sure companies that train their workforces on Hadoop will derive lots of benefits," said Jeremy Lizt, VP of engineering at Rapleaf, in a recent interview. A data provider that has been using Hadoop for nearly four years, Rapleaf was among the earliest adopters. Perhaps it's his years of experience speaking, but Lizt said, "I think intelligent technologists will pick up Hadoop very quickly."
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.