Why Sears Is Going All-In On HadoopSears pushes the cutting edge with some big data techniques, while trying to sell its big data services. Can emerging tech drive change in old-school companies?
Like many retailers, Sears Holdings, the parent of Sears and Kmart, is trying to get closer to its customers. At Sears' scale, that requires big-time data analysis capabilities, but three years ago, Sears' IT wasn't really up to the task.
"We wanted to personalize marketing campaigns, coupons, and offers down to the individual customer, but our legacy systems were incapable of supporting that," says Phil Shelley, Sears' executive VP and CTO, in a meeting with InformationWeek editors and his team at company headquarters in suburban Chicago.
- Protecting Enterprise Data From Endpoint Threats
- The Truth About Agile Solution Delivery in Government: 10 Myths Debunked
White PapersMore >>
Improving customer loyalty, and with it sales and profitability, is desperately important to Sears as it faces fierce competition from Wal-Mart and Target, as well as online retailers such as Amazon.com. While revenue at Sears has declined, from $50 billion in 2008 to $42 billion in 2011, big-box rivals Wal-Mart and Target have grown steadily, and they're far more profitable. Meantime, Amazon has gone from $19 billion in revenue in 2008 to $48 billion last year, passing Sears for the first time.
A Shop Your Way Rewards membership program started by Sears in 2011 is part of a five-part strategy to get the company back on track. Behind the scenes is a cutting-edge implementation of Apache Hadoop, the high-scale, open source data processing platform driving the big data trend. Despite Sears' less-than-cutting-edge reputation as a retailer, the company has been an innovator in using big data. In fact, Shelley is leading a Sears subsidiary, MetaScale, that's pitching services to help companies outside retail use Hadoop.
But will companies be interested in buying big data cloud and consulting services from Sears? And can Sears' own big data efforts help the company regain its footing in the retail industry?
Fast And Agile
Sears' process for analyzing marketing campaigns for loyalty club members used to take six weeks on mainframe, Teradata, and SAS servers. The new process running on Hadoop can be completed weekly, Shelley says. For certain online and mobile commerce scenarios, Sears can now perform daily analyses. What's more, targeting is more granular, in some cases down to the individual customer. Whereas the old models made use of 10% of available data, the new models run on 100%.
"The Holy Grail in data warehousing has always been to have all your data in one place so you can do big models on large data sets, but that hasn't been feasible either economically or in terms of technical capabilities," Shelley says, noting that Sears previously kept data anywhere from 90 days to two years. "With Hadoop we can keep everything, which is crucial because we don't want to archive or delete meaningful data."
Sears is still the largest appliance retailer and appliance service provider in the U.S., for example, so it's in a strong position to understand customer needs, service trends, warranty problems, and more. But Sears has only been scratching the surface of using available data.
Enter Hadoop, an open source data processing platform gaining adoption on the strength of two promises: ultra-high scalability and low cost compared with conventional relational databases. Hadoop systems at 200 terabytes cost about one-third of 200-TB relational platforms, and the differential grows as scale increases into the petabytes, according to Sears. With Hadoop's massively parallel processing power, Sears sees little more than one minute's difference between processing 100 million records and 2 billion records.
The downside of Hadoop is that it's an immature platform, perplexing to many IT shops, and Hadoop talent is scarce. Sears learned Hadoop the hard way, by trial and error. It had few outside experts available to guide its work when it embraced the platform in early 2010.
The company is now in the enviable position of having big data experience among its employees in the U.S. and India. MetaScale will leverage Sears' data center capacity in Chicago and Detroit, just as Amazon Web Services takes advantage of Amazon's massive e-commerce compute capacity.
Open Source Moves In
Sears' embrace of an open source stack began at the operating system level, with Linux. Sears routinely replaces legacy Unix systems with Linux rather than upgrade them, Shelley says, and it has retired most of its Sun and HP-UX servers. Microsoft server and development technologies are also on the way out.
Moving up the stack, Sears is consolidating its databases to MySQL, InfoBright, and Teradata--EMC Greenplum, Microsoft SQL Server, and Oracle (including four Exadata boxes) are on their way out, Shelley says.
Hadoop's power comes from dividing workloads across many commodity Intel x86 servers, each with multiple CPUs and each CPU with multiple processor cores. Since early 2010, Sears has been moving batch data processing off its mainframes and into Hadoop. Cost is the big motivator, as mainframe MIPS cost anywhere from $3,000 to $7,000 per year, Shelley says, while Hadoop costs are a small fraction of that.
Sears says it has surpassed its initial target to reduce mainframe costs by $500,000 per year, while also delivering "at least 20, sometimes 50, up to 100 times better performance on batch times," Shelley says. Eliminating all of the mainframes in use would enable it to save "tens of millions" of dollars, he says.