Why Sears Is Going All-In On Hadoop - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics
04:40 PM
Connect Directly

Why Sears Is Going All-In On Hadoop

Sears pushes the cutting edge with some big data techniques, while trying to sell its big data services. Can emerging tech drive change in old-school companies?

Like many retailers, Sears Holdings, the parent of Sears and Kmart, is trying to get closer to its customers. At Sears' scale, that requires big-time data analysis capabilities, but three years ago, Sears' IT wasn't really up to the task.

"We wanted to personalize marketing campaigns, coupons, and offers down to the individual customer, but our legacy systems were incapable of supporting that," says Phil Shelley, Sears' executive VP and CTO, in a meeting with InformationWeek editors and his team at company headquarters in suburban Chicago.

Improving customer loyalty, and with it sales and profitability, is desperately important to Sears as it faces fierce competition from Wal-Mart and Target, as well as online retailers such as Amazon.com. While revenue at Sears has declined, from $50 billion in 2008 to $42 billion in 2011, big-box rivals Wal-Mart and Target have grown steadily, and they're far more profitable. Meantime, Amazon has gone from $19 billion in revenue in 2008 to $48 billion last year, passing Sears for the first time.

A Shop Your Way Rewards membership program started by Sears in 2011 is part of a five-part strategy to get the company back on track. Behind the scenes is a cutting-edge implementation of Apache Hadoop, the high-scale, open source data processing platform driving the big data trend. Despite Sears' less-than-cutting-edge reputation as a retailer, the company has been an innovator in using big data. In fact, Shelley is leading a Sears subsidiary, MetaScale, that's pitching services to help companies outside retail use Hadoop.

But will companies be interested in buying big data cloud and consulting services from Sears? And can Sears' own big data efforts help the company regain its footing in the retail industry?

Fast And Agile

Sears' process for analyzing marketing campaigns for loyalty club members used to take six weeks on mainframe, Teradata, and SAS servers. The new process running on Hadoop can be completed weekly, Shelley says. For certain online and mobile commerce scenarios, Sears can now perform daily analyses. What's more, targeting is more granular, in some cases down to the individual customer. Whereas the old models made use of 10% of available data, the new models run on 100%.

"The Holy Grail in data warehousing has always been to have all your data in one place so you can do big models on large data sets, but that hasn't been feasible either economically or in terms of technical capabilities," Shelley says, noting that Sears previously kept data anywhere from 90 days to two years. "With Hadoop we can keep everything, which is crucial because we don't want to archive or delete meaningful data."

Sears is still the largest appliance retailer and appliance service provider in the U.S., for example, so it's in a strong position to understand customer needs, service trends, warranty problems, and more. But Sears has only been scratching the surface of using available data.

Enter Hadoop, an open source data processing platform gaining adoption on the strength of two promises: ultra-high scalability and low cost compared with conventional relational databases. Hadoop systems at 200 terabytes cost about one-third of 200-TB relational platforms, and the differential grows as scale increases into the petabytes, according to Sears. With Hadoop's massively parallel processing power, Sears sees little more than one minute's difference between processing 100 million records and 2 billion records.

CTO Shelley: big data zealot
CTO Shelley: big data zealot

The downside of Hadoop is that it's an immature platform, perplexing to many IT shops, and Hadoop talent is scarce. Sears learned Hadoop the hard way, by trial and error. It had few outside experts available to guide its work when it embraced the platform in early 2010.

The company is now in the enviable position of having big data experience among its employees in the U.S. and India. MetaScale will leverage Sears' data center capacity in Chicago and Detroit, just as Amazon Web Services takes advantage of Amazon's massive e-commerce compute capacity.

Open Source Moves In

Sears' embrace of an open source stack began at the operating system level, with Linux. Sears routinely replaces legacy Unix systems with Linux rather than upgrade them, Shelley says, and it has retired most of its Sun and HP-UX servers. Microsoft server and development technologies are also on the way out.

Moving up the stack, Sears is consolidating its databases to MySQL, InfoBright, and Teradata--EMC Greenplum, Microsoft SQL Server, and Oracle (including four Exadata boxes) are on their way out, Shelley says.

Hadoop's power comes from dividing workloads across many commodity Intel x86 servers, each with multiple CPUs and each CPU with multiple processor cores. Since early 2010, Sears has been moving batch data processing off its mainframes and into Hadoop. Cost is the big motivator, as mainframe MIPS cost anywhere from $3,000 to $7,000 per year, Shelley says, while Hadoop costs are a small fraction of that.

Sears says it has surpassed its initial target to reduce mainframe costs by $500,000 per year, while also delivering "at least 20, sometimes 50, up to 100 times better performance on batch times," Shelley says. Eliminating all of the mainframes in use would enable it to save "tens of millions" of dollars, he says.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
1 of 3
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Soozy G. Miller
Soozy G. Miller,
User Rank: Strategist
12/5/2012 | 3:34:31 PM
re: Why Sears Is Going All-In On Hadoop
"But will companies be interested in buying big data cloud and consulting services from Sears?"

So, what, is Sears planning on entering a whole new market? Then again, The Gap started by selling music and jeans.
Soozy G. Miller
Soozy G. Miller,
User Rank: Strategist
12/5/2012 | 3:31:54 PM
re: Why Sears Is Going All-In On Hadoop
@srentner good point. It's great that Sears has a great big data solution; it's quite another to get the analytics to put that big data to use.
User Rank: Apprentice
10/31/2012 | 7:59:28 PM
re: Why Sears Is Going All-In On Hadoop
The final questions are extremely apropos and often cause the most confusion: "Could quick analytical access to an entire decade of medical record data change how doctors diagnose and treat patients? Could faster processing spot financial services fraud more effectively?" This is not what Hadoop does. It is not an analytics technology, as pointed out in page 1. Extracting this type of valuable insight from the data requires a new class of analytics technologies, and the more powerful the mathematical algorithms, the faster and more accurate the insight.
Ellis Booker
Ellis Booker,
User Rank: Moderator
10/31/2012 | 3:07:19 PM
re: Why Sears Is Going All-In On Hadoop
This is a big story on a number of fronts. First, it clearly expresses the value of big data analysis for retailers. As one of the Sears executives puts it, "With Hadoop we can keep everything, which is crucial because we don't want to archive or delete meaningful data." Second, it addresses the oft-heard complaint that big data solutions are prohibitively expensive--in fact, Sears says it reduced mainframe costs by more than $500,000 per year. Finally, the installation moves the retailer closer to real-time analysis: "Sears can develop in three days interactive reports that used to take IT six to 12 weeks." --Ellis Booker, InformationWeek Community Editor
IT Careers: 10 Industries with Job Openings Right Now
Cynthia Harvey, Freelance Journalist, InformationWeek,  5/27/2020
How 5G Rollout May Benefit Businesses More than Consumers
Joao-Pierre S. Ruth, Senior Writer,  5/21/2020
IT Leadership in Education: Getting Online School Right
Jessica Davis, Senior Editor, Enterprise Apps,  5/20/2020
White Papers
Register for InformationWeek Newsletters
Current Issue
Key to Cloud Success: The Right Management
This IT Trend highlights some of the steps IT teams can take to keep their cloud environments running in a safe, efficient manner.
Flash Poll